The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the rapid development of e-commerce, a large number of product reviews have arisen on the Internet. Sentimental orientation analysis and mining of product reviews become important for the study of the product reputation. Sentiment analysis takes the form of analyzing and classifying the reviews contents about a product, event, and place etc into positive, negative or neutral opinion. In this...
The natural language semantic corpus construction is the key step to implement information exchange in the intelligent cloud-computing environment. This paper makes a detailed analysis of semantic corpus construction technologies, and proposes a new webpage de-duplication algorithm based on TF-IDF and word vector distance. Experimental results show accuracy and efficiency of the proposed method. Our...
Now a days the social media is come into bud of online social networks (e.g., Facebook [1], Google+ [2]) and video streaming sites e.g., YouTube [3], as well as a coming together between the two types of systems. More and more media contents (video clips, images, etc.) are published and shared among users on social network sites while the video streaming systems are increasingly leveraging social...
For the shortcoming of the traditional focused crawler, this paper proposed an improved focused crawl method which based on syntactic dependency analysis. This method generates a words collection of the text through TF-IDF algorithm and generates a phrases collection through syntactic dependency analysis firstly. Then evaluate the collection of words and phrases to select set of keywords of the text...
With the rapid rise in social network users during recent years, social network is changing business models in the China's Internet industry. Social network produces large amounts of data that reflect the real world, so we can make conclusions about financial incidents by monitoring people's interests in social network and analyzing investors' feelings based on the data. To obtain the data from financial...
A network motif is a frequent and unique sub-graph pattern defined in a network and it has been appliedin various biological and medical problems. However, findingnetwork motifs is computationally intensive task as it involvesheavily resource-demanding tasks. We have suggested a coupleof parallelization efforts to alleviate the computational intensity inthe past including MASS(Multi-Agent Spatial...
Online Public Opinion Systems (OPOS) target at collecting, analyzing, summarizing and monitoring massive public opinions on the Internet in real time. Meanwhile, OPOS often have the ability to identify the key or sudden events, and thus notify related people immediately for rapid responses to these events. As part of this endeavor, this paper introduces the architecture and techniques of an OPOS that...
World Wide Web has become the major source of information dissemination. Due to its vast expansion and heterogeneity, users faces difficulty in finding relevant results quickly. Ranking is an important application of web mining which is based on the structure, content and usage. Many algorithms exist for web page ranking and these algorithms are based upon one or more parameters such as forward links,...
Rapid increase in internet users along with growing power of online review sites and social media has given birth to Sentiment analysis or Opinion mining, which aims at determining what other people think and comment. Nowadays, several websites are available on which a variety of products are advertised and sold. Prior to making a purchase an online shopper typically browses through several similar...
The centralized search engine has problems of excessive server load and limited extended ability when dealing with the massive Internet information. And the search results of general search engine is not so accurate. To solve these problems, a vertical search engine based on Hadoop called HVSE was designed and developed. HVSE was based on the basic principle of the traditional search engine. It improved...
A large number of URLs collected by web crawlers correspond to pages with duplicate or near-duplicate contents. To crawl, store, and use such duplicated data implies a waste of resources, the building of low quality rankings, and poor user experiences. To deal with this problem, several studies have been proposed to detect and remove duplicate documents without fetching their contents. To accomplish...
In 2012, the Dutch National Research and Education Network, SURFnet, observed a multitude of Distributed Denial of Service (DDoS) attacks against educational institutions. These attacks were effective enough to cause the online exams of hundreds of students to be cancelled. Surprisingly, these attacks were purchased by students from websites, known as Booters. These sites provide DDoS attacks as a...
Web page ranking algorithms are used to score the universal resource locators or simply online links of the web applications. The corporate world strives to develop the web applications in such a way so that it can be visible on the top results in the major search engines and search directories. A number of web page ranking algorithms are developed with different scientific approaches making use of...
Information safety is significant for state security, especially for intelligence service. OSIA (open source intelligence analyzing) system based on cloud computing and domestic platform is designed and implemented in this paper. For the sake of the security and utility of OSIA, all of the middleware and involved OS are compatible with domestic software. OSIA system concentrates on analyzing open...
This article gives an overview of the currently available literature on web page ranking algorithm using machine learning. Web page ranking algorithm, a well-known approach to rank the web pages available on cyber world. It helps us to know -- how the search engine exactly works and how a machine learn itself while giving priority to the page that which page is important to successfully fulfills the...
Users and uses of internet is growing tremendously these days which causing an extreme trouble and efforts at user side to get web pages searched which are as per concern and relevant to user's requirement Generally users approach to search web pages from a large available hierarchy of concepts or use a query to browse web pages from available search engine and receive results based on search pattern...
This paper explores the measuring similarity between the web objects which are one of the fundamental task in information retrieval domain. This paper proposes a framework for improved and efficient web object search based on search domain. The concept of proposed approach defines the similarity between two objects (object can be a link or text content) and retrieve the related links with their content...
With the rapid development of the network, stand-alone crawlers have been hard to find and gather the massive information. The form of crawlers will gradually tend to distributed. This paper proposes a task scheduling strategy based on weighted Round-Robin for small-scale distributed crawler, and formula weights for the current node based on crawling efficiency, so that each node can load balance...
Cloud services emerge as one of the most important parts for a company. Amazon, Rackspace, Google, Microsoft, to name a few, all fight to gain a foothold as cloud services providers. CB-Cloudle, a search engine aiming to discover the available options of cloud services and to suggest the most appropriate alternatives, is presented here to meet with the end users' needs. In this work, this software...
The WEB HITS algorithm was based on data acquisition modules for vast data collection. The algorithm is typical in using the Web link structure excavation to establish data bindings between the page links to improve the linked structure. This paper suggests another acquisition module to improve the HITS algorithm, and it also had been practiced and applied through the government website platform.
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.