The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Data Clustering in Data Mining is a domain which never gets out of focus. Clustering a data was always an easy task but achieving the required accuracy, precision and performance was never so easy. K means being an archaic clustering algorithm got tested and experimented thousands of times with variety of datasets and other combination of algorithm due to its robustness and simplicity but what this...
The limited number of resources in Wireless sensor Networks (WSNs) and long communication distance between sensors and base station causes high energy consumption and consequently reduce the network lifetime. Therefore one of the important parameters in these networks is the optimized energy consumption. One way to reduce the energy consumption is to cluster the network. In this study, a dynamic clustering...
Clustering is a task to divide objects into group depends on their similarity. The optimal of solving clustering problem occurs when the data joins in one group which has a similar category. This study combines Adaptive Genetic Algorithm, K-Means and Greedy Selection to solve clustering problem, named RAGKA. In first step, the centroid is determined by K-Means. Crossover and mutation are performed...
In this paper we present a sampling approach to run the k-means algorithm in large data sets. We propose a genetic algorithm to guide sampling based on evaluating the fitness of each individual of the population through the k-means clustering algorithm. Although we want a partition with the lowest SSE, our algorithm tries to find the sample with the highest SSE. After finding a good sample the remaining...
Hierarchical clustering is of enormous importance in data analytics especially because of the exponential growth of the real world data. Frequently these data are unlabelled and there is small prior domain knowledge offered. In this work the plan is to improve the efficiency by introducing a set of methods dealt with synthetic and real data on agglomerative hierarchical clustering followed by k-means...
Attribute-based data clustering has been proven as one of the efficient methods in data clustering. Set theory approaches for data clustering exist to handle attribute-based data clustering. The MDDS, a soft set based technique has proven its applicability in data clustering. However, in reviewing MDDS, where its calculations are based on comparing all constructed multi-soft sets, it still suffers...
The proposed Pareto ranking scheme is meant for the selection of parents and survivors in multi-objective evolutionary optimizations. Commonly, the Pareto methods use just the dominance analysis in order to provide the partial sorting of solutions, without taking into account the specific strength of the conflict detected between objectives. This can generate undesired effects, such as the loss of...
Clustering is one of the most widely studied problem in machine learning and data mining. The algorithms for clustering depend on the application scenario and data domain. K-Means algorithm is one of the most popular clustering techniques that depend on distance measure. In this work, an extensive empirical evaluation of three significant variations of K-Means algorithm is carried out on the basis...
The Security of network resources, computer systems and data has become a great issue resulting from the advent of the internet and the threats that comes with it. To ensure a good level of security, Intrusion Detection Systems (IDS) have been widely deployed and many techniques to detect, identify and classify attacks have been proposed, developed and tested either offline or online. In this paper,...
Clustering is an unsupervised technique, which partitions the entire input space into regions. These initial partitions have a great impact on the resulting clusters. In this paper, a new Multi Stage Genetic Clustering (MSGC) scheme for multiobjective optimization in data clustering is proposed, which can automatically partition the data into an appropriate number of clusters. K-means is a well-known...
K-means algorithm is sensitive to the initial cluster centers and clustering results diverge with different initial input which in turn falls into local optimum. Genetic Algorithms are randomized searching technique which provides a better optimal solution for fitness function of an optimization problem. This paper proposes an enhanced K-means Genetic Algorithm for optimal clustering of data (EKMG)...
Most of the existing literatures use Euclidean distance based cluster validity measures in order to identify correct number of clusters for different datasets. It is a very important consideration for clustering. Symmetry can be considered as an important attribute for data clustering. It can be of two types, point symmetry and line symmetry. In this paper we have introduced a newly developed line...
Traditional K-means algorithm is sensitive to the initial cluster centers, cluster results fluctuate with different initial input and are easy to fall into local optimum. This paper proposes an optimized genetic K-means clustering algorithm based on genetic algorithm. Use encoding, initialization, fitness function selection, crossover and mutation of genetic algorithms into clustering problem. Experiment...
An extension of principal component analysis called ipPCA has been proposed earlier for analyzing structure in genetic data. This non-parametric framework iteratively classifies individuals into subpopulations. However, it is prone to false positives when dealing with large datasets and mixed-type genetic markers. We address these shortcomings by introducing a unified encoding scheme and suggesting...
The adaptive genetic algorithm (AGA) designed as a general optimization method was applied to the ambiguity resolution of pulse Doppler (PD) radar. The fitness, based on squared error for multi-PRF (pulse repeat frequency) consecutive ordered ranges, was designed. The crossover operator and the conditions of ending in GA were discussed. The relations among the probability of ambiguity resolution of...
Understanding the genotype-phenotype association is a fundamental problem in genetics. A major open problem in mapping complex traits is identifying a set of interacting genetic variants (such as single nucleotide polymorphisms or SNPs) that influence disease susceptibility. Logic regression (LR) is a statistical approach that has been proposed to model interactions of SNPs. Several LR-based association...
Clinical data has been employed as the major factor for traditional cancer prognosis. However, this classic approach may be ineffective for analyzing morphologically indistinguishable tumor subtypes. As such, the microarray technology emerges as the promising alternative. Despite a large number of microarray studies, the actual clinical application of gene expression data analysis remains limited...
In the past, we proposed a time series segmentation approach by combining the clustering technique, the Discrete Wavelet Transformation (DWT) and the genetic algorithm to automatically find segments and patterns from a time series. In this paper, we propose a PIP-based evolutionary approach, which uses Perceptually Important Points (PIP) instead of DWT, to effectively adjust the length of subsequences...
NSF1 is one of the newly discovered fermentation stress response proteins that play a crucial role in the adaptation of the yeast Saccharomyces cerevisiae to fermentation stress conditions. Using time course microarray gene expression profiles of Saccharomyces cerevisiae (DBY7286) grown in YPD media, we identified and mapped genes significantly correlated to the NSF1 expression, hence producing a...
Gene expression (micro array) data have been used widely in bioinformatics. The expression data of a large number of genes from small numbers of subjects are used to identify informative biomarkers that may predict or help in diagnosing some disorders. More recently, increasing amounts of information from underlying relationships of the expressed genes have become available, and workers have started...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.