The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes a new system of categorization and classification using data mining techniques based on certain criteria/topics. We describe the design and implementation of proposed system that automatically categorizes a restaurant as being good or bad, using data mining techniques, based on users' reviews. For this study we took a data set consisting of approximately 9,000 reviews for 2,355...
The more increasing active users in sharing information and in interacting with others on online social network unconsciously has reflected the existence of many data that can be used as the research objects for various purposes. Hence, the activity of data crawling is critical as a first gate in accessing the information in social network. This study aims to develop software of data crawler by using...
There are about 3 billion indexed websites present in the WWW. Not all websites do not belong to a particular topic are indexed by a search engine say google.com, there are online platforms available where different users help the person asking for a (Universal Resource Locator) URL containing a topical information. To verify the authenticity and validity of the URL, an empirical methodology and its...
Social emotion analysis of online users has become an important task for mining public opinions, which aims at detecting the readers' emotions evoked by online news articles. In this paper, we focus on building a social emotion analysis system (SEAS) for online news. The system has implemented a text data crawler for mainstream online news websites, the modules of document preprocessing, document...
Social Network Analysis (SNA) is a field of study that focuses on analyzing user profiles and participations on social network channels in order to model relationships between people and to predict certain behaviors or knowledge. To achieve their goals, researchers, interested in SNA, have to extract content and structure from the numerous social networks available today. Existing tools, which help...
Modern web users are exposed to a browser security threat called drive-by-download attacks that occur by simply visiting a malicious Uniform Resource Locator (URL) that embeds code to exploit web browser vulnerabilities. Many web users tend to click such URLs without considering the underlying threats. URL blacklists are an effective countermeasure to such browser-targeted attacks. URLs are frequently...
Automatic 3D neuron reconstruction for very large 3D light microscopy images remains to be a challenge in neuroscience. Few existing neuron tracing algorithms can be used with commonly available computers (laptops, desktops, or workstations) to efficiently and accurately reconstruct a neuron in image stacks that are tens of gigabytes or greater. We introduce a new automatic tracing algorithm called...
Users and uses of internet is growing tremendously these days which causing an extreme trouble and efforts at user side to get web pages searched which are as per concern and relevant to user's requirement Generally users approach to search web pages from a large available hierarchy of concepts or use a query to browse web pages from available search engine and receive results based on search pattern...
Social network like a corpus with valuable data, has attracted much attention from a various fields of researchers in recent years, especially in the subject of big data analytics. However, as the foundation, the part of efficient and accurate data collection has not been focused much in the past published works. During the data among the web increasing rapidly, this article will identify two major...
The Universal Communication Research Institute (UCRI), NICT conducts research and development on universal communication technologies: multi-lingual machine translation, spoken dialogue, information analysis and ultra-realistic interaction technologies, through which people can truly interconnect, anytime, anywhere, about any topic, and by any method, transcending the boundaries of language, culture,...
The WEB HITS algorithm was based on data acquisition modules for vast data collection. The algorithm is typical in using the Web link structure excavation to establish data bindings between the page links to improve the linked structure. This paper suggests another acquisition module to improve the HITS algorithm, and it also had been practiced and applied through the government website platform.
With analysis and research for of the serial formation of swarms robots movement, the strategy of keeping swarms robots formation in the process of schlepping and turning was proposed based on the discussion of the robot kinematic model. In the condition of the wireless communication the master-slave robots can control their speed based on environmental information in order to keep the spacing between...
Detecting the P2P swarm, analyzing their distribution is a challenging task, which has not received the deserved attention. In this paper, we demonstrate an active measurement methodology to continuously trace the real-world Bit Torrent and eMule/eDonkey swarms over the Internet from a stub network and for a long period of time. Our measurements achieve the ability of real-time scanning the online...
The number of files stored in a personal computer is increasing very quickly, so it is difficult for users to find the information they want. One desktop search engine named SoDesktop is proposed in this paper, which is composed of four modules including Data crawler, Task scheduler, Data indexer and Data searcher. The implementations of these four modules are described in details, and the implementation...
Now a days people use search engines every now and then to retrieve documents from the Web. Web crawling is the process by which a search engine gather pages from the Web to index them and support a search engine. Web crawlers are the heart of search engines. Web crawlers continuously keep on crawling the web and find any new web pages that have been added to the web, pages that have been removed...
Study reports that about 40% of current internet traffic and bandwidth consumption is due to the web crawlers that retrieve pages for indexing by the different search engines. As the size of the web continues to grow, searching it for useful information has become increasingly difficult. The centralized crawling techniques are unable to cope up with constantly growing web. In this paper it is presented...
In order to apply the web crawler's technology to the construction of video semantic information, this paper propose the video information collection architecture based on Web crawler after has analyzed the basic principles, key technologies and problems of the current crawler technology. The system framework divided to two parts. The background part mainly manage Web crawler; Foreground part mainly...
The expansion of the World Wide Web has led to a state where a vast amount of Internet users face and have to overcome the major problem of discovering desired information. It is inevitable that hundreds of web pages and weblogs are generated daily or changing on a daily basis. The main problem that arises from the continuous generation and alteration of web pages is the discovery of useful information,...
FTP search engine is one of the main important tools in network applications. This paper presents a design of FTP search engine system based on Lucene, which develops a multithreaded spider as an extension of Lucene, and improves the Chinese word segmentation with maximum matching algorithm in the Lucene documents. Finally the main functions and run-time examples of this system are shown.
A web search engine is designed to search for information on the World Wide Web (WWW). Crawlers are software which can traverse the internet and retrieve web pages by hyperlinks. In the face of the large spam websites, traditional web crawlers cannot function well to solve this problem. Focused crawlers utilize semantic web technologies to analyze the semantics of hyperlinks and web documents. The...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.