The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Speeches delivered in the French Parliament by deputies and government members are analysed, similarities between individuals are induced by the word corpus used, and finally deputies are grouped through hierarchical clustering. Similarity measures between political individuals are compared on a classification task: assigning a party to each actor. Finally, this analysis lead to a new organisation...
A semi-supervised approach for classification of network flows is analyzed and implemented. This traffic classification methodology uses only flow statistics to classify traffic. Specifically, a semi-supervised method that allows classifiers to be designed from training data consisting of only a few labeled and many unlabeled flows. The approach consists of two steps, clustering and classification...
In this paper, a novel face annotation framework is proposed that systematically leverages context information such as situation awareness information with current face recognition (FR) solutions. In particular, unsupervised situation and subject clustering techniques have been developed that are aided by context information. Situation clustering groups together photos that are similar in terms of...
Extracting useful information from user generated text on the web is an important ongoing research in natural language processing, machine learning, and data mining. Online tools like emails, news groups, blogs, and web forums provide an effective communication platform for millions of users around the globe and also provide an added advantage of anonymity. Millions of people post information on different...
Clustering is considered as the most important unsupervised learning problem. It aims to find some structure in a collection of unlabeled data. Dealing with a large quantity of data items can be problematic because of time complexity. On the other hand high dimensional data is a challenge arena in data clustering e.g. time series data. Novel algorithms are needed to be robust, scalable, efficient...
Cat swarm optimization (CSO) is one of the new heuristic optimization algorithm which based on swarm intelligence. Previous research shows that this algorithm has better performance compared to the other heuristic optimization algorithms: Particle swarm optimization (PSO) and weighted-PSO in the cases of function minimization. In this research a new CSO algorithm for clustering problem is proposed...
A common problem in biology is to partition a set of experimental data into clusters in such a way that the data points within the same cluster are highly similar while data points in different clusters are very different. In this direction, clustering microarray time-series data via pairwise alignment of piece-wise linear profiles has been recently introduced. We propose a EM clustering approach...
This paper addresses an important and vital problem within the general area of disease recognition, namely identifying disease biomarker genes. Given the complexity of this domain, the basic idea tacked in this paper is employing multiple agents to handle this problem. Though the developed methodology is general enough to be applied to any other domain, we concentrate on identifying cancer biomarkers...
In feature gene selection, filtering model concerns classification accuracy while ignoring gene redundancy problem. On the other hand, gene clustering finds correlated genes without considering their predictive abilities. It is valuable to enhance their performances by the help of each other. We report a new feature gene extraction algorithm, namely double-thresholding extraction of feature gene (DEFG),...
As a newly-proposed clustering algorithm based on random fuzziness model, RFKM has improved performance compared with other fuzzy clustering algorithms. However the low mobility of accuracy will lead to local optimal solution. To solve this problem, we present an Entropy-based FRKM (ERFKM) algorithm. Meanwhile, in order better to facilitate the optimal operation of the ERFKM, this paper applies entropy...
In this paper, we propose a memetic algorithm (MA) for classifier optimization based on a clustering method that applies the k-means algorithm over a specific derived space. In this space, each classifier or individual is represented by the set of the accuracies of the classifier for each class of the problem. The proposed sensitivity clustering is able to obtain groups of individuals that perform...
Distinguishing potential new cluster data from outliers is a main problem in mining new pattern from evolving data streams. Meanwhile, all the clustering algorithms inherited from CluStream framework are distribution-based learning which are realized via a sliding window, so this problem becomes more obvious. This paper proposes a three-step clustering algorithm, rDenStream, based on DenStream, which...
Collaborative filtering has been very successful in both research and applications. Current collaborative filtering based on clustering compute the whole set of items during the process of clustering or selecting nearest-neighbors, because the researchers believed if users have similar preferences on some of items, they will have the similar preferences on other items. But we think that users have...
The amount of XML documents is increasing rapidly. In order to analyze the information represented in XML documents efficiently, researches on XML document clustering are actively in progress. The key issue is how to devise the similarity measure between XML documents to be used for clustering. Since XML documents have hierarchical structure, it is not appropriate to cluster them by using a general...
Traditional application identification based on port numbers has become increasingly inaccurate. A more accurate alternative is to inspect the application payloads of traffic flows. The main drawback of such method is that target applications must be manually analyzed beforehand. Another alternative is to exploit the distinctive statistical properties of traffic flows and apply machine learning techniques...
In this paper, a new framework to build an adaptive classifier is introduced. At first, a clustering algorithm, density-based spatial clustering of applications with noise (DBSCAN) is applied to a set of sample data to form initial set of clusters. The clusters are represented as classes. Using support vector machine (SVM), a classifier model is generated. In real world application, data comes in...
Perspective deformation is one of the main issues needed to be addressed in real-scene character recognition. An effective recognition approach, which is able to handle severe perspective deformation, is to employ cross ratio spectrum and dynamic time warping techniques. However, this solution suffers from a time complexity of O(n4). In this paper, a clustering based indexing method is proposed to...
We propose a novel algorithm based on clustering to extract rules from artificial neural networks. After networks Beijing trained and pruned successfully, inner-rules are generated by discrete activation values of hidden units. Then, weights between input and hidden units are clustered to decrease the complexity of rules extraction. In clustering phase, the clustered number of weights can be adjusted...
The Possibilistic Latent Variable (PLV) clustering algorithm is a powerful tool for the analysis of complex datasets due to its robustness toward data distributions of different types and its ability to accurately identify the inherent clusters within the data. The scaling coefficient in the PLV algorithm plays a key role in reducing the effects of noise, thereby improving the precision of the clustering...
Clustering for better representation of the diversity of text or image search results has been studied extensively. In this paper, we extend this methodology to the novel domain of music search. We conduct empirical evaluation of different clustering algorithms, audio feature representations, and the incorporation of lyrics for music clustering. Our evaluation shows the fusion of audio and text features...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.