The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Recently, liquid chromatography coupled to mass spectrometry (LC-MS) has become a standard technique for identifying differential abundance of peaks as biomarkers. Two major problems in the preprocessing of LC-MS data analysis are how to adjust and align multiple LC-MS datasets efficiently and correctly. Hence, an effective algorithm is needed to adjust the variation in retention time and align protein...
In this paper a hierarchical structure is proposed for automatic gender identification (AGI). In this structure two clustering techniques are used. The first technique is divisive clustering for dividing speakers from each gender to some classes of speakers. The second clustering technique is agglomerative clustering for creating a hierarchical structure. Feature reduction is done by SOAP feature...
Data mining has been defined as "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data". Clustering is the automated search for group of related observations in a data set. The K-Means method is one of the most commonly used clustering techniques for a variety of applications. This paper proposes a method for making the K-Means algorithm...
Data mining has become an important topic in effective analysis of gene expression data due to its wide application in the biomedical industry. Within a gene expression matrix there are usually several particular macroscopic phenotypes of samples. Selection of genes most relevant and informative for certain phenotypes is an important aspect in gene expression analysis. Currently most of the research...
An effective XML cluster method called neighbor center clustering algorithm (NCC) is presented in this paper, whose similarity is obtained through both structural and content information contained in XML files. Structural similarity is measured by the idea of Longest Common Subsequence, while content similarity is achieved using TF-IDF principles. It reduces computation complexity by avoiding direct...
We propose a clustering algorithm based on a structural prior based Local Factor Analysis (spLFA) model under the Bayesian Ying-Yang harmony learning, which automatically determines the hidden dimensionalities during parameter learning, reduces the number of free parameters by projecting the mean vectors onto a low dimensional manifold, imposes the sparseness by a Normal-Jeffreys prior. Experiments...
VDBSCAN is very famous Density based clustering algorithm. Handling highly dense data point is a challenging task in clustering. VDBSCAN algorithm handles widely varied density data points well and also over comes the problem of noise and outlier. But this algorithm is depends on the input parameters Eps and Minpts. The careful selection of these input parameters plays an important role in proper...
This paper proposes a new self-growing Bayesian network classifier for online learning of human motion patterns (HMPs) in dynamically changing environments. The proposed classifier is designed to represent HMP classes based on a set of historical trajectories labeled by unsupervised clustering. It then assigns HMP class labels to current trajectories. Parameters of the proposed classifier are recalculated...
This paper formulates, simulates and assess an improved data clustering algorithm for mining web documents with a view to preserving their conceptual similarities and eliminating the problem of speed while increasing accuracy. The improved data clustering algorithm was formulated using the concept of K-means algorithm. Real and artificial datasets were used to test the proposed and existing algorithm...
Experiments are carried out on datasets with different dimensions selected from UCI datasets by using two classical clustering algorithms. The results of the experiments indicate that when the dimensionality of the real dataset is less than or equal to 30, the clustering algorithms based on distance are effective. For high-dimensional datasets--dimensionality is greater than 30, the clustering algorithms...
Most quantitative cell image-based screening analyses are dependent on thorough user supervision based on assay-specific knowledge. To minimize human bias in analysis, we introduce an automated methodology of displaying screen phenotypes using clustering that provides intuitive visuals to guide user supervision when required. Our premise is to automatically present to users an overview of screen phenotype-contents...
This paper compares hard and soft updating centroids for clustering Y-STR data. The hard centroids represented by New Fuzzy k-Modes clustering algorithm, whereas the soft centroids represented through k-Population algorithm. These two algorithms are experimented through two datasets, Y-STR haplogroups and Y-STR Surnames. The results show that the soft centroid performance is better than the hard centroid...
Among the large number of genes presented in microarray data, only a small fraction of them are effective for performing a certain diagnostic test. However, it is very difficult to identify these genes for disease diagnosis. In this regard, a new supervised gene clustering algorithm is proposed to cluster genes from microarray data. The proposed method directly incorporates the information of response...
The performance of the Automatic Speech Recognition (ASR) system reduces greatly when speech is corrupted by noise. In spectrogram representation of a speech signal, after deleting low SNR elements, incomplete spectrogram is obtained. In this case, the speech recognizer should make modifications to spectrogram to restore the missing elements, which is one direction. In another direction speech recognizer...
Detection of outliers and relevant features are the most important process before classification. In this paper, a novel semi-supervised k-means clustering is proposed for outlier detection in mammogram classification. Initially the shape features are extracted from the digital mammograms, and k-means clustering is applied to cluster the features, the number of clusters is equal with the number of...
Constrained clustering through matrix factorization has been shown to largely improve clustering accuracy by incorporating prior knowledge into the factorization process. Although it has been well studied, none of them deal with constrained multi-way data factorization. Multi-way data or Tensors are encoded as high-order data structures. They can be seen as the generalization of matrices. One typical...
In this paper we propose a novel approach for introducing semantic relations into the bag-of-words framework for recognizing human actions. We represent visual words in two different views: the original features and the document co-occurrence representation. The latter view conveys semantic relations but is large, sparse and noisy. We use canonical correlation analysis between the two views to find...
In this paper, we propose a dynamic technique for selecting the most informative samples in classification problems as coming in two stages: the first stage conducts sample selection in batch off-line mode based on unsupervised criteria extracted from cluster partitions, the second phase proposes an active learning scheme during on-line adaptation of classifiers in non-stationary environments. This...
The ultimate goal in a multiple classifier system (MCS) is to obtain a global and more accurate model through the combination of several base learners. Among the popular combining rules, averaging has been emphasized as a well qualified option. The averaging rule can be applied with equal (simple averaging) or non-equal (weighted averaging) weights vector for the linear combination. When the formed...
Growing Self Organizing Map (GSOM) has proven benefits in text clustering. Latent Semantic Analysis (LSA) also has been used in text clustering to capture the latent concepts from text. This paper presents a novel combination of GSOM and LSA to improve text clustering results compared to using GSOM on its own. LSA is an inherently global algorithm that looks at trends and patterns globally and GSOM...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.