The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The researchers have started looking for Internet traffic recognition techniques that are independent of ‘well known’ TCP or UDP port numbers, or interpreting the contents of packet payloads. Newer approaches classify traffic by recognizing statistical patterns in externally observable attributes of the traffic (such as typical packet lengths and inter-arrival times). The main goal is to cluster or...
Information search is one of the most common activities over Internet. Several search engines crawl the web round the clock and fetch useful information according to users' interest. To date, most of these search engines exploit the keyword based searching procedure which might be inappropriate for some naive users such as children. It may be very intricate for children to search their desired content...
Processes within real world networks evolve according to the underlying graph structure. A number of examples exists in diverse network genres: botnet communication growth, moving traffic jams [1], information foraging [2] in document networks (WWW and Wikipedia), and spread of viral memes or opinions in social networks. The network structure in all the above examples remains relatively fixed, while...
On and after the Great Eastern Japan Earthquake, various false information and rumor have been spread on Twitter. To cope with this, we proposed the method for automatically assessing the credibility of information based on the topic and opinion classification[16]. The information credibility is assessed calculating the ratio of the same opinions to all opinions about a topic. To identify the topic...
Classification of network traffic is extensively required mainly for many network management tasks such as flow prioritization, traffic shaping/policing, and diagnostic monitoring. Many approaches have been evolved for this purpose. The classical approaches such as port number or payload analyis methods has their own limitations. For example, some applications uses dynamic port number and encryption...
Due to the inherent design faults of the Border Gateway Protocol (BGP), BGP prefix hijacking remains a serious security threat to the Internet routing system. AS hijacking enables an attacker to pass the prefix ownership validation mechanism, it is more sophisticated than IP prefix hijacking. So far, many efforts have been done on the detection of prefix hijacking, however, AS hijacking has not received...
Due to the lack of the mechanism within BGP to verify the authority of an Autonomous System (AS) to announce Network Layer Reachable Information (NLRI), a specific IP prefix may be hijacked by a suspicious AS, leading to Internet instability even crash. Current proposals either are still no widely deployed for expensive overhead and complex key management, such as S-BGP, soBGP, etc, or can be incrementally...
The use of online review sites has grown significantly, allowing for communities to share information on products or services.These online review sites are marketed as being independent and trustworthy, but have been criticised for not ensuring the integrity of the reviews.One major concern is that of review fraud; where a person (such as a marketer) is paid to write favourable reviews for one product...
According to the desired level of analyzing words, Arabic stemming algorithms can be classified into stem-based (light stemming algorithms), and root-based algorithms. Light stemming algorithms only remove prefixes and suffixes from the words, while root-based algorithms remove prefixes, suffixes and infixes. There are several light stemmers for Arabic (Light1, Light2, Light3, Light8, and Light10),...
Hierarchical models becomes one of the most widely-adopted and effective solutions in organizing large volume of documents. Although there are general taxonomies on the Web, we observe that in most cases there will be many inconsistencies between general taxonomy and specific resources as the generation of taxonomies is independent of the resources. Besides with the newly available resources into...
Open-source payload-based traffic classifiers are frequently used as a source of ground truth in the traffic classification research field. However, there have been no comprehensive studies that provide evidence that the classifications produced by these software tools are sufficiently accurate for this purpose. In this paper, we present the results of an investigation into the accuracy of four open-source...
Inferring the Autonomous System (AS) level end-to-end paths is valuable for both network operators and researchers. It has been widely researched by scholars. Many algorithms are based on the AS topology with labeling AS business relationships, and give the path sets which satisfied for the shortest valley-free property. However, they all consider the paths in the candidate paths set undifferentiated,...
In view of the problem that currently the students choose online test questions blindly, the establishment of the item recommendation system is necessary. According to the student's level, an estimating algorithm is used to sort items and recommend the question which is in the front of the sort to the student. According to the study of student over a certain period of time and all the answers to the...
Nowadays found some micro-blog commercial extraction algorithm only considering the relationship between the key words and the number of it appearing in texts, and ignoring the key words' distribution in a certain category, which leads the decreased accuracy problems of micro-blog commercial word extraction. To solve this problem, the application of TF-IDF algorithm in words weight calculation was...
With the advent of high speed links, online flow measurement for, e.g., flow round trip time (RTT), becomes difficult due to the enormous amount of requirements on computational resources. We address this problem by proposing the double-deletion bloom filter (DDBF) scheme, which alleviates potential hash collisions of a standard bloom filter by explicitly deleting used records and implicitly deleting...
Error detection in OCR output using dictionaries and statistical language models (SLMs) have become common practice for some time now, while designing post-processors. Multiple strategies have been used successfully in English to achieve this. However, this has not yet translated towards improving error detection performance in many inflectional languages, specially Indian languages. Challenges such...
This paper evaluates an automated scheme for aligning and combining optical character recognition (OCR) output from three scans of a book to generate a composite version with fewer OCR errors. While there has been some previous work on aligning multiple OCR versions of the same scan, the scheme introduced in this paper does not require that scans be from the same copy of the book, or even the same...
Internet traffic classification is important in many aspects of network management such as data exploitation detection, malicious user identification, and restricting application traffic. Previously, features such as port and protocol numbers have been used to classify traffic, but these features can now be changed easily, making their use in traffic classification inadequate. Consequently, traffic...
In several research areas, e.g. in the field of human-robot interaction, ratings or questionnaires are applied using offline and online methods. An argument for the use of online methods is the efficiency. By using the Internet, data can be collected much faster than in an offline experiment and the administration effort is very low. The goal of our study was to find out, if there is a difference...
Content-Centric Network (CCN) provides a cleanslate design for the Internet, where content becomes the primitive of communications. In CCN, routers are equipped with content stores, which act as caches for frequently requested content. This design enables the Internet to provide content distribution services without any application-layer support. On the other hand, as caches are integrated into routers,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.