The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Traffic classification has become a crucial domain of research due to the rise in applications that are either encrypted or tend to change port consecutively. The challenge of flow classification is to determine the applications involved without any information on the payload. In this paper, our goal is to achieve a robust and reliable flow classification using data mining techniques. We propose a...
In this paper we propose a web log mining-based network user behavior analysis scheme, which plays an important role in network structure optimization and website server configuration. Based on clustering and regression model, we studied the network user's visit model in a university by analyzing a large amount of web log data which is collected from the university campus network. The data analyzing...
This paper first studies the methods of web documents mining and text clustering, and summaries the fuzzy clustering algorithms and similarity measure functions, then proposes a modified similarity function which can solve the problems of feature selection and feature extraction in high-dimensional space. Finally, this paper puts forward to a dynamic fluzzy clustering algorithm(DCFCM) by combining...
With the high development of Internet, e-commerce websites now routinely have to work with log datasets which are up to a few terabytes in size. How to remove messy data timely with low cost and find out useful information is a problem we have to face. The mining process involves several steps from pre-processing the raw data to establishing the final models. In this paper we describe our method to...
This paper formulates, simulates and assess an improved data clustering algorithm for mining web documents with a view to preserving their conceptual similarities and eliminating the problem of speed while increasing accuracy. The improved data clustering algorithm was formulated using the concept of K-means algorithm. Real and artificial datasets were used to test the proposed and existing algorithm...
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. In many cases, regression algorithms such as linear regression or neural networks attempt to fit the target variable as a function of the input variables without regard to the underlying joint distribution of the variables. As a result, these global models...
Image annotation is a promising approach to bridging the semantic gap between low-level features and high-level concepts, and it can avoid the heavy manual labor. Most existing automatic image annotation approaches are based on supervised learning. They often encounter several problems, such as insufficiency of training data, lack of ability in dealing with new concept, and a limited number of semantic...
Nowadays, using web and Internet as a world wide information system faces us with so many data. In this direction, the necessity of accessing some tools for data processing in web level which helps the man intelligently to transform these data into useful knowledge seems so important. Clustering the web pages is one of these techniques. In this paper, a new algorithm has been represented to cluster...
This paper offers an overview on the concept of "personalization" applied to e-Learning processes. And we introduce a Personalized Resource Recommendation System(PRRS) in e-Learning by using Data Mining techniques. In the PRRS, there are four sub-modules: Learner Model, Learning Materials Clustering, Personalized Recommendation and Personalized Evaluation. PRRS is proposed for the purpose...
Inspired by a huge amount of empirical study of real world networks such as the Internet, the Web, as well as various social and biological networks, researchers have in recent years developed several random graph models to help us to understand the most fundamental properties of these systems. Simple characteristics observed in many real world networks are 1.) a high clustering coefficient, i.e.,...
Co-clustering can be viewed as a two-way (bilinear) factorization of a large data matrix into dense/uniform and possibly overlapping sub-matrix factors (co-clusters). This combinatorially complex problem emerges in several applications, including behavior inference tasks encountered with social networks. Existing co-clustering schemes do not exploit the fact that overlapping factors are often sparse,...
As a programming model, MapReduce is popularly and widely used in processing and generating large cluster of data sets distributed on large amount of machines. With its widespread use, its validity and other major properties need to be analyzed in a formal framework. In this paper, a formal model is presented using CSP method. We focus on the dominant parts of MapReduce and formalize them in detail...
This paper discusses the two important phases, which are data preprocessing and clustering analysis, in Web transactions clustering analysis, in order to gain an easily interpreted clustering result, we introduce the "Concept URL" in the data preprocessing phase; In the clustering analysis phase, A model of artificial ant is set up. Based on this model, we implement an ant-colony clustering...
MapReduce has emerged as a model of choice for supporting modern data-intensive applications. The model is easy-to-use and promising in reducing time-to-solution. It is also a key enabler for cloud computing, which provides transparent and flexible access to a large number of compute, storage and networking resources. Setting up and operating a large MapReduce cluster entails careful evaluation of...
With the extensive growth of data available on the Internet, personalization of this huge information becomes essential. Although, there are various techniques of personalization, in this paper we concentrate on using data mining algorithms to personalize web sitespsila usage data. This paper proposes an off-line model based web usage mining that is generated by clustering algorithm.Then, we will...
In this paper, we analyze the network structure of two SNSs, academic community system (ACS) and Amippy. From the viewpoint of network topology, the major characteristics of these data sets can be summarized as follows: low average shortest-path length, high clustering coefficient, presence of a power law degree distribution and negative assortativity. Based on our analysis, we propose a growth model...
Because of today's explosive information from Internet, people will contact much new information at any moment. So how to analyze this non-stationary information becomes more and more important. Clustering analysis is a good information analysis method, but many clustering algorithms only fit to stationary situation. Then in this paper, a novel incremental clustering algorithm based on self-organizing-mapping-IGSOM...
In the environment of data integration over the Internet, the remote serverpsilas contention states take direct effect on the cost of a data query. So to determine the server contention states plays an import role to estimate the cost of query. This paper uses sample queries and k-means algorithm to determine the remote serverpsilas contention states, and get the response cost of the server, then...
The 3 most important issues for anomaly detection based intrusion detection systems by using data mining methods are: feature selection, data value normalization, and the choice of data mining algorithms. In this paper, we study primarily the feature selection of network traffic and its impact on the detection rates. We use KDD CUP 1999 dataset as the sample for the study. We group the features of...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.