The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
This paper proposes an effective clustering algorithm for databases, which are benchmark data sets of data mining applications. We present a Genetic Clustering Algorithm (GCA) that finds a globally optimal partition of a given data sets into a specified number of clusters. The algorithm is distance-based and creates centroids. To evaluate the proposed algorithm, we use some artificial data sets and...
The development of mobile network technology provides a great potential for social networking services. This paper studied data mining for social network analysis purpose, which aims at find people's social network patterns by analyzing the information about their mobile phone usage. In this research, the real database of MIT's Reality Mining project is employed. The classification model presented...
As there are several limitations using mobile internet, mobile content personalization seems to be an alternative to enhance the experience of using mobile internet. In this paper, we propose the mobile content personalization framework to facilitate collaboration between the client and the server. This paper investigates clustering and classification techniques using K-means and Artificial Neural...
A more practical, efficient, fast identification for food raw materials is favorable to improve the current food security situation. In order to improve this kind of condition, this paper presents a vegetable oils discrimination based on improved K-Means algorithm and according the GC of vegetable oil. And this algorithm is improved in selecting original center of clustering so that the traditional...
The identification of network application based on port number is constrained by the status that more and more peer-to-peer applications make use of dynamic port numbers. Some clustering methods are used to cope with this problem, but they are time-consuming. In this paper, a method based on wavelet transforming and K-means is proposed. Its basic idea is doing data preprocessing with wavelet transformation...
This paper presents a new dynamic data clustering algorithm based on K-means and combinatorial particle swarm optimization, called KCPSO. Unlike the traditional K-means method, KCPSO does not need a specific number of clusters given before performing the clustering process and is able to find the optimal number of clusters during the clustering process. In each iteration of KCPSO, a discrete PSO is...
Aiming at the problem of higher false positive and missing report rate in network intrusion detection, an intrusion detection method based on clustering algorithm is proposed in this paper. This method applies Fuzzy C-means clustering Algorithm to the detection of network intrusion. Through the building of intrusion detection model, carries out fuzzy partition and the clustering of data, and this...
The paper presents a new graph based clustering algorithm. Traditional clustering algorithms have the drawback that it takes large number of iterations in order to come up with the desired number of clusters. The advantage of this approach is that the size of the dataset is reduced using graph based clustering approach and the required number of clusters is generated using K means algorithm. The proposed...
Document clustering is the process of partitioning a set of unlabeled n documents into clusters such that documents in each cluster share some common concepts. Each concept is conveniently represented by some key terms. Using words as features, text data are represented as a vector in a very high dimensional vector space. However, most documents are sparse vectors, for example, more than ten thousand...
Most clustering algorithms, such as k-means and fuzzy c-means (FCM), are used to cluster a set of objects based on a function of dissimilarities between objects. However, clustering on attribute variables of objects may give more cluster information. Thus, to have a clustering algorithm that can be designated to construct simultaneously an optimal partition of objects and also attribute variables...
Four neural signals are recorded by without stimulation, by stimulation using a toothbrush, pen shaft and needle under an anesthetized rat. First, spectral subtraction is used to reduce noise and the nonlinear energy operator is adopted to detect spikes. Then, independent component analysis is performed with dynamic dimension increase to extract the features and form a feature vector. Finally, k-means...
In nonsupervised data set, the importance of each feature is different. If the feature is setted with a proper weight, which can fully considers the lever of the influence on the cluster effect, then the clustering result will be improved. A feature evaluate function is proposed to obtain a set of feature weight vectors by minimizing the function, which is a multi-objective problem. So a fast and...
The aim of this paper is to present an industrial application of a new procedure for classification. The problem is solved by minimizing the distance between the components and the centers of the clusters. It is therefore critical to determine the best centers of the clusters. Once the centers of each class are determined, the rule of center neighbourhood is applied to assign an element to a class...
Text clustering is one of the difficult and hot research fields in the Internet search engine research. Using and improving K-means clustering techniques, a new text clustering algorithm is presented. Firstly, texts are preprocessed to satisfy succeed process. Secondly, the paper improves the gravity centers calculation method and algorithm flow of K-means cluster algorithm to improve efficiency and...
A fast clustering algorithm based on foregone samples for mixed data (FCABFS) in network anomaly detections technology is proposed in this paper. Original clustering center is exactly obtained by FCABFS through training foregone samples; Clustering center and non- similarity is calculated by separating objects. This algorithm solved problem of the higher false positive rate and the lower detection...
Ant clustering is one of effective clustering methods. Compares to other clustering methods, ant clustering algorithm has one outstanding advantage and one disadvantage. The advantage is that the total numbers of cluster is generated automatically ,and the disadvantage is that its cluster result is random and its result is influenced by the input data and the parameters, which leads low quality of...
Authentication by biometric verification is becoming increasingly common in corporate, public security and other such systems. There is scads of work done in the area of offline palmprints like palmprint segmentation, crease extraction, special areas, feature matching etc. But to the best of our knowledge no work has been done yet to extract and identify the right hand of a person, given his/her left...
This paper proposes a method to cluster documents of variable length. The main idea is to apply (a) automatic identification of 1, 2, and 3 grams (To reduce the dependency on huge background vocabulary support or learning or complex probabilistic approach), (b) order them by some measure of relevance, which is developed with the help of Tf-Idf and Term-Weighting approach, and finally (c) use them...
Due to the explosion in the number of autonomous data sources, there is a growing need for effective approaches for distributed knowledge discovery and data mining. The distributed clustering algorithm is used to cluster the distributed datasets without necessarily downloading all the data to a single site. Many applications can benefit from soft clustering, where each object is assigned to multiple...
Unsupervised or supervised anomaly intrusion detection techniques have great utility with the context of network intrusion detection system. However, large amount of labeled attack instances used by supervised approaches are difficult to obtain. And this makes most existing supervised techniques hardly be implemented in the real world. Unsupervised methods are superior in their independency on prior...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.