The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this article, an efficient and scalable distributed web crawler system based on Hadoop will be design and implement. In the paper, firstly the application of cloud computing in reptile field is introduced briefly, and then according to the current status of the crawler system, the specific use of Hadoop distributed and cloud computing features detailed design of a highly scalable crawler system,...
Cloud computing technology is a new paradigm which provides Information Technology (IT) resources via the Internet. This new shift in the way that IT re-sources are offered to the user brings new challenges, such as cloud service discovery. Nowadays, cloud users are faced with a dilemma as they have an abundant choice of cloud services. Moreover, many cloud providers offer a range of services which...
Provisioning cloud applications usually is a complex task as it involves the deployment and configuration of several components (e.g., load balancer, application server, database) and cloud services (computing, storage, CDN, etc.) also known as application blueprints or topologies. The Topology and Orchestration Specification for Cloud Applications (TOSCA) is a recent standard that has focused on...
There are several research projects ongoing to apply cloud computing to industrial systems. The main focus of them is real-time performance of virtual machines (VMs) since it is important to guarantee a time-critical feature of industrial systems. However, there is another important issue that how much computing resource (CPU, memory, etc.) should be allocated to each VM which runs processes of an...
Information safety is significant for state security, especially for intelligence service. OSIA (open source intelligence analyzing) system based on cloud computing and domestic platform is designed and implemented in this paper. For the sake of the security and utility of OSIA, all of the middleware and involved OS are compatible with domestic software. OSIA system concentrates on analyzing open...
Web crawlers work on the behalf of applications or services to find interesting and related information on the web. For example, search engines use web crawlers to index the Internet. Web crawlers have several challenges, such as complexity between links and highly intensive computation requirements when a web crawler wants to retrieve complex connected links. Another issue is the storage of a massive...
Internet-based communication defines two main types of services as Pull and Push services, depending on the side that sends the request for transmission of information. In contrast to Pull services, whose request for transmission is initiated by the client, Push service denotes a type of transmission where the request for a given exchange of information is initiated by the publisher or central server...
The newly proposed Alert Notification Service (ANS) represents a web service that automatically visits all requested web sites selected by a certain user and alerts the user when a certain keyword phrase has been changed. This saves users' time and effort by reducing the repeatedly visiting multiple web sites looking for some specific information or keywords. In comparison with other systems of the...
Internet traffic is experiencing an explosive growth, and online shopping is one of the significant drivers. However, alert network operators, unwilling to be dumb pipes, are making every effort to mine mass traffic with the help of Deep Packet Inspection (DPI) which is regarded as a big challenge especially for massive data when traditional methods and programming model are utilized. Hadoop provides...
We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored...
In the face of large amounts of complicated web information, a professional and individualized web crawling system is required for users to acquire information effectively. In this context, a cloud-based web crawling system is proposed, which can improve the efficiency of application development and reduce its maintenance costs. However, it also poses security risks for application developers. To...
Cloud services emerge as one of the most important parts for a company. Amazon, Rackspace, Google, Microsoft, to name a few, all fight to gain a foothold as cloud services providers. CB-Cloudle, a search engine aiming to discover the available options of cloud services and to suggest the most appropriate alternatives, is presented here to meet with the end users' needs. In this work, this software...
We propose a non-intrusive approach for monitoring virtual machines (VMs) in the cloud. At the core of this approach is a mechanism for selective real-time monitoring of guest file updates within VM instances. This mechanism is agentless, requiring no guest VM support. It has low virtual I/O overhead, low latency for emitting file updates, and a scalable design. Its central design principle is distributed...
Cloud services have unique characteristics, including dynamic and diverse service offerings at different levels, few standardized description languages, and varied deployment platforms. Searching such services is thus challenging. The authors' cloud service crawler engine collects metadata about 5,883 cloud services over the Web after parsing more than half a million possible links. An extensive statistical...
Over the past few years, Cloud computing has been receiving much attention as a new computing paradigm for providing flexible and on-demand infrastructures, platforms and software as services. In Cloud computing, challenges in searching cloud services need to be renewed due to a number of unique characteristics of cloud services such as the dynamic, diverse services offering at different levels, as...
This paper is aimed to create implementation crawler engine or search engine using cloud computing infrastructure. This approach use virtual machines on a cloud computing infrastructure to run service engine crawlers and also for application servers. Based on our initial experiments, this research has successfully built crawler engine that runs on Virtual Machine (VM) of cloud computing infrastructure...
Provisioning and maintenance of infrastructure for Web based digital library search engines such as CiteSeerx present several challenges. CiteSeerx provides autonomous citation indexing, full text indexing, and extensive document metadata from document scrawled from the web across computer and information sciences and related fields. Infrastructure virtualization and cloud computing are particularly...
Mining and analyzing data from social networks can be difficult because of the large amounts of data involved. Such activities are usually very expensive, as they require a lot of computational resources. With the recent success of cloud computing, data analysis is going to be more accessible due to easier access to less expensive computational resources. In this work we propose to use cloud computing...
This paper proposes a software integration model of service component architecture in the vending industry. We use this architecture to rapidly integrate related services, substantially reduce development costs, establish innovative services, and provide consumers with a brand new experiential shopping environment in retail domain. Meanwhile, we apply a cloud computing technology to solve the following...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.