The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
There has been an increase in the use of image processing for object recognition. However, traditional methods are not suitable in real-time system because they cannot satisfy human performance. Recently, deep learning with Convolutional Neural Network came to be known as a solution for image recognition. In fact, there are many great result with deep learning in object recognition. However, it needs...
In this paper, we will propose SNS crawler engine for topic expansion. The Smart Broadcasting Platform uses only subtitle, and scripts in extracting topic. But, there are not sufficient words in them in order for adopting it to various domains. Therefore, it needs to include more data sources for extracting richer topics. We will also introduce the system architecture of SNS crawler engine, describe...
TRAMPER is an autonomous benthic crawler equipped with oxygen sensors to perform long-term flux time series measurements at abyssal depth. The crawler is developed within the HGF-Alliance ROBEX. TRAMPER has five main subsystems: the titanium frame with the flotation, the caterpillar drive system, recovery and communication systems, energy and electronics and a multi-optode profiler as the scientific...
Innovative robotic technologies are a key to study ocean processes in space and time. The work carried out during the ROBEX-Demonstration Mission on RV Polarstern will test the capability of new and innovative technologies, developed during the HGF Alliance ROBEX, in deep-sea environments. Investigations will include Arctic benthic and pelagic ecosystems strongly influenced by climate change, such...
Today, web is used in virtually all the spheres of daily life. Internet is a worldwide storehouse of information. Internet searcher is a data recovery framework that has different site information stored in it. The web index records No. of times the URL accessed; it goes to the web URL to recover its meta data and indicates page mistakes. The web searcher is at first sustained with URLs of various...
Although Mainline DHT is not an official BitTorrent protocol so far, it's widely used in many BitTorrent clients such as Utorrent, Vuze, BitComet and Xunlei. Despite there are millions of Mainline DHT users, only little research focus on Mainline DHT. In this paper, we present two kinds of measurement results based on crawling Mainline DHT. Firstly, we develop an effective crawler and crawl Mainline...
In recent years, the application of quantitative investment in China stock market attracts more and more attention, and an important step in the quantitative investment is the acquisition of stock data, including each stocks daily data, markets daily data as well as the fundamental data, etc. There are more than two thousand shares in stock markets of Shanghai and Shenzhen, and each share's data information...
In this paper, we propose a framework for focused Linked Data (LD) crawler based on context graphs. A focused crawler searches for a specific subset of web, in our case it targets interlinked RDF data stores. The proposed crawler constructs set of context graphs for the given seed URIs by back crawling the web, and classifiers are trained to detect and assign documents to different categories based...
The Internet has always been growing with all the contents and information added by different types of users. Without proper storage and indexing, these contents can easily be lost in the sea of information housed by the Internet. Hence, an automated program, known as the web crawler is used to index all the contents added to the Internet. With proper configurations and settings, a web crawler can...
This article gives an overview of the currently available literature on web page ranking algorithm using machine learning. Web page ranking algorithm, a well-known approach to rank the web pages available on cyber world. It helps us to know -- how the search engine exactly works and how a machine learn itself while giving priority to the page that which page is important to successfully fulfills the...
Web applications become an important part for Communication now days. As the popularity of the web application increases like online transaction, net banking and many more, the role of web security has been increase as well. Web applications vulnerabilities let attackers to carry out malicious activities that range from gaining unauthorized access or stealing the sensitive data. Past research have...
Now a days in context of online social media, hackers have started using social networks like Twitter, Facebook Google+ etc for their unauthorized activities. These are very popular social networking sites which are used by numerous people to get connected with each other and share their every day's happenings through it. In this paper we consider twitter as such a social networking site to experiment...
Cloud services emerge as one of the most important parts for a company. Amazon, Rackspace, Google, Microsoft, to name a few, all fight to gain a foothold as cloud services providers. CB-Cloudle, a search engine aiming to discover the available options of cloud services and to suggest the most appropriate alternatives, is presented here to meet with the end users' needs. In this work, this software...
The exponential growth in digital information led to the need of increasingly sophisticated search tools like the web search engines. Search engines return ranked list of documents and are less effective when users need precise answers to natural language questions. Question Answering systems involve this critical capability required for the next generation web search engines, to reduce the painstaking...
Today, the web is all about the dynamic content; the information created whilst it is needed i.e. the resources are not readily available to the users. Then how it is possible that a web crawler finds a resource that is either protected by a session or hidden behind an authentication form? The query triggered to look-for the answers to the questions on web crawlers which are; what is a crawler? Why...
This paper describes a remote control system for a crawler-type mobile robot with a passive sub-crawler. Such a system is greatly advantageous because it has an essentially compliant mechanism in that the sub-crawler angle can be adapted to a road surface shape. Its operation is extremely simple, and it is only necessary to control the movement direction and driving speed in comparison with the case...
A general crawler downloads web pages that may be of any kind, thus forming a source of information for the search engine. Blog crawler is similar to a general crawler except that it restricts its crawl boundary to the blog space, thus downloading only the blog pages and ignoring rest of the web. Since blog is an emerging phenomenon and serve as very useful source of information, a blog crawler proves...
Metadata integration and use represents an open problem due to the rapid expansion of data resources in a nonstandard compliant environment. The increasing need to access and retrieve relevant data based on semantic similarity from distributed, autonomous and heterogeneous sources demands innovative solutions that can offer an integrated global view over various local schemas. The framework proposed...
With the development of computer network and widely used of Internet, online information increases in broadband level exponentially, the difficulty and complexity of information retrieval also increase gradually, so the Crawler is developing rapidly. Crawler is a program that can auto collect information from internet. In this paper, we design and implement a multi-thread Crawler for specific resources...
Study reports that about 40% of current internet traffic and bandwidth consumption is due to the web crawlers that retrieve pages for indexing by the different search engines. As the size of the web continues to grow, searching it for useful information has become increasingly difficult. The centralized crawling techniques are unable to cope up with constantly growing web. In this paper it is presented...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.