The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Feature (gene) selection is an important preprocessing step for performing data mining on large-scale bioinformatics datasets. However, one known concern is that feature selection can sometimes give very different results when applied to very similar data sets. Ensemble gene selection is a promising new approach which may help resolve this concern, producing more stable gene lists and better classification...
Essential proteins affect the cellular life deeply, but it is extreme time-consuming and labor-intensive to discriminate them experimentally. The goal of this paper is to identify the features which are crucial for discriminating protein essentiality and build learning machines for prediction. We first collect features from a variety of sources. Then we adopt a backward feature selection method and...
Classical approaches to analyze transcriptomic data usually produce average classification models that have very low reproducibility. In this work, genome wide gene expression is considered through the activity of large regulatory networks. We introduce a new measure of regulatory influence based on the variations of expression of genes in a large inferred regulatory network. This methodology can...
Analyzing the changes in volatility is an important aspect in financial data analysis leading to effective estimation of risk and discovering underlying causes of such changes. While there is a rich literature in estimating implied and stochastic volatility in financial time series using traditional econometric methods, the application of machine learning methods such as sparse regression with temporal...
This paper considers the use of feature selection within the state detection module for an ocean turbine condition monitoring system. The goal is to reduce the quantity of data to be processed while maintaining or improving state detection capabilities. Five feature selection techniques (Chi-squared, Information Gain, Signal-To-Noise, AUC and PRC) are evaluated based on their effects on four widely...
Online algorithms allow data points to be processed sequentially, which is important for real-time applications. In this paper, we propose a novel online clustering approach based on a mixture of Dirichlet processes with Dirichlet distributions, which can be viewed as an extension of the finite Dirichlet mixture model to the infinite case. Our approach is based on nonparametric Bayesian analysis,...
The concept of Web 2.0 or "semantic web" has been getting more and more popular during the last half decade. The potential of very subtle yet important emergent semantics hidden in such environments calls for equally elegant and powerful methods to "mine" them. However, much of the previous work on model based recommender systems for folksonomies considered user to resource and...
Different parts of an instance may be strong or weak indicators of the instance's label. We propose a new annotation strategy, where in addition to an instance's label, the annotator indicates parts of the instance that are rationales for its label. For two text classification tasks, we show that rationales provide a significant improvement in performance. Each instance (with or without rationales)...
Ordinal data classification (ODC) has a wide range of applications in areas where human evaluation plays an important role, ranging from psychology and medicine to information retrieval. In ODC the output variable has a natural order, however, there is not a precise notion of the distance between classes. The recently proposed method for ordinal data, Kernel Discriminant Learning Ordinal Regression...
We propose a systematic approach to identify outlier in transactional data. First, we define a measure to estimate an outlying score for each transaction. Then, based on the estimated scores, we propose a probabilistic method that exploits the beta mixture model to automatically identify outliers. In contrast to existing transactional outlier detection methods, the approach that we propose does not...
This paper presents a model-based criterion for assessing the clustering results of spatial data, where both geometrical constraints and observation attributes are taken into account. An extra parameter is often used in the aim of controlling the importance of each characteristic. Since the values of both terms vary according to different realizations of data, it becomes essential to determine the...
Probabilistic latent semantic analysis (PLSA) has been widely used in the machine learning community. However, the original PLSAs are not capable of modeling real-valued observations and usually have severe problems with over fitting. To address both issues, we propose a novel, regularized Gaussian PLSA (RG-PLSA) model that combines Gaussian PLSAs and hierarchical Gaussian mixture models (HGMM). We...
This paper is concerned with the issue of online time series segmentation. This problem, common in a number of applicative fields, continues to receive increasing attention. The present article introduces a novel threshold-free sequential time series segmentation approach. It is based on the concurrent estimation of two models (a model with one regressive segment and a two-component temporal mixture...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.