The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
System mainly studies mass events found images from the Internet, this paper focuses on the data label document Flickr to quantify. This paper also implements single-pass clustering algorithm using traditional text clustering. In this paper, achieve three strategies of single-pass clustering algorithm, and analyze and compare the three strategies. Different document order in the event of the discovery...
As a leading energy player, EDF (Électricité de France) actively works on new techniques to better understand customers' voice. In order to process unstructured and semi-structured massive text data, we have developed HETA, an application based on open source solutions which offers different text processing steps (document engineering, text analysis, clustering and visualization) on top of Hadoop...
Service architecture of the Internet becomes more and more complex as it expands as a medium for large-scale distribution of diverse content. Dynamic growth of various content distribution systems, deployed by influential Internet companies, content distributors, aggregators and owners, has substantial impact on distribution of the network traffic and the scalability of various Internet services....
Botnets have become one of the major tools used by attackers to perform various malicious activities on the Internet, such as launching distributed denial of service attacks, sending spam, leaking personal information, and so on. In this paper, we present BotCatch, a behavior-based botnet detection system that considers multiple coordinated group activities in the monitored network to identify bot-infected...
Recently, with wide use of computer systems, internet, and rapid growth of computer networks, the problem of intrusion detection in network security has become an important issue of concern. In this regard, various intrusion detection systems have been developed for using misuse detection and anomaly detection methodologies. These systems try to improve detection rates of variation in attack types...
Traditional k-means algorithm has been used successfully to various problems but its application is restricted to small datasets. Online websites like twitter have large amount of data that has to be handled properly. So, there is a need of a platform that can perform faster data clustering which leds to the development of Mahout/Hadoop. Mahout is machine learning library approach to parallel clustering...
Opinion leaders are core users in online communities, who can guide the direction of the public opinion. With the rapid development of microblog, identification of the microblog opinion leaders has become a significant task. In this paper, we propose a hybrid data mining approach based on user feature and interaction network, which includes three parts: a way to analyze users' authority, activity...
As a product of Web2.0, micro-blog is developing rapidly these years. More and more information spread on the micro-blog because of its high speed and convenience, social hotspots and news events included. As a result, discovering, extraction and analyzing information become researching hotspots. By studying micro-blog text and long text cluster, this article draws a conclusion that traditional cluster...
Characteristics of flow describe the pattern and trend of network traffic, it helps network operator understanding network usage and user behavior, especially useful for those who concerns more about network capacity planning, traffic engineering and fault handling. Due to the large scale of datacenter network and explosive growth of traffic volume, it's hard to collect, store and analyze Internet...
Social network analysis comprises a popular set of tools for the analysis of online social networks. Among these techniques, k-shell decomposition of a graph is a popular technique that has been used for centrality analysis, for communities discovery, for the detection of influential spreaders, and so on. The huge volume of input graphs and the environments where the algorithm needs to run i.e., large...
Wireless mesh networks (WMNs) have emerged as a key technology for next-generation wireless networking, because of their advantages over other wireless networks. Due to the ever growing structure of WMN the traffic volume is expected to be very high, thus load balancing becomes a crucial part of it. Network load balancing enhances the scalability and availability of network. Load balancing can be...
The role of the intrusion detection system is to enforce the pattern matching policies decided for the network. Basically Proposed IDS executes on the KDD'99 Data set, this data set is used in international level for evaluating/calculating the performance of various intrusion detection systems (IDS). First step is association phase in which frequent item set are produced by apriori algorithm. The...
Polymorphic worms are considered as the most dangerous threats to the Internet security, and the danger lies in changing their payloads in every infection attempt to avoid the security systems. In this paper, we propose an accurate signature generation system for zero-day polymorphic worms. We have designed a novel Double-honeynet system, which is able to detect zero-day polymorphic worms that have...
Online reviews greatly impact consumers' purchasing decisions. A slight difference in a business' rating on a review website can significantly change the company's bottom line in some cases. By the same token, review websites are often targeted by spammers with fraudulent reviews, either to exaggerate the positive features of a business itself or to defame a competitor with negative ratings/comments...
Via analyzing characters of vast disaster news on the internet, a new topic detection algorithm based on Group Average Hierarchical Clustering (GAHC), which is suitable for the processing of big data on the network, is proposed in this paper. The core idea of GAHC is to divide big data into smaller groups, and then cluster groups hierarchically to generate final topics. During the process of clustering,...
Social media offers an opportunity for emergency management to identify issues that need immediate reaction. To support the effective use of social media, an analysis approach is needed to identify crisis-related hotspots. We consider in this investigation the analysis of social media (i.e., Twitter, Flickr and YouTube) to support emergency management by identifying sub-events. Sub-events are significant...
Application of network classification can be seen in many domains. These varies from preserving the quality of network to analyzing personal characteristics of network users. However current methods applied for network data classification does not meet the expectations. This is because networks are dynamic which are prone to rapid changes, while methods used for the classification has been either...
Recommendation systems are important big data applications that are used in many business sectors of the global economy. While many users utilize Hadoop-like MapReduce systems to implement recommendation systems, we utilize the high-performance shared-memory MapReduce system Phoenix++ [1] to design a faster recommendation engine. In this paper, we design a distributed out-of-core recommendation algorithm...
Classification of network traffic is extensively required mainly for many network management tasks such as flow prioritization, traffic shaping/policing, and diagnostic monitoring. Many approaches have been evolved for this purpose. The classical approaches such as port number or payload analyis methods has their own limitations. For example, some applications uses dynamic port number and encryption...
The Internet is a decentralized structure that offers speedy communication, has a global reach and provides anonymity, a characteristic invaluable for committing illegal activities. In parallel with the spread of the Internet, cybercrime has rapidly evolved from a relatively low volume crime to a common high volume crime. A typical example of such a crime is the spreading of spam emails, where the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.