The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In 2010, we proposed the improved unsupervised possibilistic clustering algorithm (IUPC) that can be run as an unsupervised clustering and overcome the weakness of the unsupervised possibilistic clustering algorithm (UPC) that it tends to generate coincident clusters. IUPC inherits the merits of UPC. In the meanwhile, IUPC solves the coincident clusters problem of UPC by limiting the feasible regions...
With the advent of modern techniques for scientific data collection, large quantities of data are getting accumulated at various databases. Systematic data analysis methods are necessary to extract useful information from rapidly growing data banks. Cluster analysis is one of the major data mining methods and the k-means clustering algorithm is widely used for many practical applications. But the...
Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. Within a gene expression matrix there are usually several particular macroscopic phenotypes of samples. Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Currently most of the research...
VDBSCAN is very famous Density based clustering algorithm. Handling highly dense data point is a challenging task in clustering. VDBSCAN algorithm handles widely varied density data points well and also over comes the problem of noise and outlier. But this algorithm is depends on the input parameters Eps and Minpts. The careful selection of these input parameters plays an important role in proper...
This paper formulates, simulates and assess an improved data clustering algorithm for mining web documents with a view to preserving their conceptual similarities and eliminating the problem of speed while increasing accuracy. The improved data clustering algorithm was formulated using the concept of K-means algorithm. Real and artificial datasets were used to test the proposed and existing algorithm...
Experiments are carried out on datasets with different dimensions selected from UCI datasets by using two classical clustering algorithms. The results of the experiments indicate that when the dimensionality of the real dataset is less than or equal to 30, the clustering algorithms based on distance are effective. For high-dimensional datasets--dimensionality is greater than 30, the clustering algorithms...
Regression problems on massive data sets are ubiquitous in many application domains including the Internet, earth and space sciences, and finances. In many cases, regression algorithms such as linear regression or neural networks attempt to fit the target variable as a function of the input variables without regard to the underlying joint distribution of the variables. As a result, these global models...
Simplified Silhouette Filter (SSF) is a recently introduced feature selection method that automatically estimates the number of features to be selected. To do so, a sampling strategy is combined with a clustering algorithm that seeks clusters of correlated (potentially redundant) features. It is well known that the choice of a similarity measure may have great impact in clustering results. As a consequence,...
Although fuzzy k-modes algorithm has removed the numeric-only limitation of the k-means algorithm, that each attribute of the centroid with a single category value and the use of a simple distance measure will compromise its precision, and therefore prone to falling into local optima. In this paper, an extended fuzzy k-means(xFKM) algorithm for clustering categorical valued data is presented, in which...
We introduce a new fuzzy relational clustering technique with Local Scaling Parameter Learning (LSPL). The proposed approach learns the underlying cluster dependent dissimilarity measure while finding compact clusters in the given data set. The learned measure is a Gaussian similarity function defined with respect to each cluster that allows to control the scaling of the clusters and thus, improve...
Feature weighting plays an important role in improving the performance of clustering technique. We propose an automated feature weighting in fuzzy declustering-based vector quantization (FDVQ), namely AFDVQ algorithm, for enhancing effectiveness and efficiency in classification. The proposed AFDVQ imposes weights on the modified fuzzy c-means (FCM) so that it can automatically calculate feature weights...
In many application domains such as information retrieval, computational biology, and image processing the data dimension is usually very high. Developing effective clustering methods for high dimensional dataset is a challenging problem due to the curse of dimensionality. The k-means clustering algorithm is used for many practical applications. But it is computationally expensive and the quality...
In traditional machine learning applications, only labeled data is used to train the classifier. Labeled data are difficult, expensive, time-consuming and require human experts to be obtained in several real applications. Semi-supervised learning address this issue. Semi-supervised learning uses large amount of unlabeled data, combined with the labeled data, to build better classifiers. The semi-supervised...
A new methodology to learn descriptive linguistic Fuzzy Rule-based System Knowledge Bases from examples based on the combination of fuzzy clustering and evolutionary simultaneous rule selection and membership functions tuning is presented in this work. Fuzzy clustering is used to achieve a preliminary description of the data, in other words to obtain information on the definition of the linguistic...
Based on clonal selection principle and the immunodominance theory, a new immune clustering algorithm, Immunodomaince based Clonal Selection Clustering Algorithm (ICSCA) is proposed in this paper. An immunodomaince operator is introduced to the clonal selection algorithm, which can realize on-line gaining prior knowledge and sharing information among different antibodies. The proposed method has been...
This paper presents a clustering ensemble method based on our novel three-staged clustering algorithm. A clustering ensemble is a paradigm that seeks to best combine the outputs of several clustering algorithms with a decision fusion function to achieve a more accurate and stable final output. Our ensemble is constructed with our proposed clustering algorithm as a core modelling method that is used...
In this paper we present a novel segmentation approach that performs fuzzy clustering and feature extraction. The proposed method consists in forming a new descriptor combining a set of texture sub-features derived from the Grating Cell Operator (GCO) responses of an optimized Gabor filter bank, and Local Binary Pattern (LBP) outputs. The new feature vector offers two advantages. First, it only considers...
As K-means Clustering Algorithm is sensitive to the choice of the initial cluster centers and it is difficult to determine the cluster number and it is easy to be impacted by isolated points, propose the K-means multiple Clustering Method Based on Pseudo Parallel Genetic Algorithm. In the method, adopt the strategy of Variable-Length Chromosome real-coded. Through the introduction of chromosome retreading...
Accurate traffic classification is critical in network security monitoring and traffic engineering. To overcome the deficiencies of traditional traffic classification methods with port mapping and signature matching, several machine learning techniques were proposed. However, there are two main challenges for classifying network traffic using machine learning method. Firstly, labeled samples are scarce...
In cluster analysis process used in data mining which enables extracting interesting data patterns from datasets, accuracy and efficiency are the factors which play a pivotal role. Scatter/Gather is a cluster-based browsing model, and most of previous works on this model focused on efficiency of the clustering algorithm. In this paper we present an algorithm which could improve the accuracy of the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.