The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Finding the location of binding sites in DNA is a difficult problem. Although the location of some binding sites have been experimentally identified, other parts of the genome may or may not contain binding sites. This poses problems with negative data in a trainable classifier. Here we show that using randomized negative data gives a large boost in classifier performance when compared to the original...
Recently, the so-called Support Feature Machine (SFM) was proposed as a novel approach to feature selection for classification, based on minimisation of the zero norm of a separating hyper plane. We propose an extension for linearly non-separable datasets that allows a direct trade-off between the number of misclassified data points and the number of dimensions. Results on toy examples as well as...
Precursor miRNAs (pre-miRNAs) are usually extracted to obtain quite a lot of global and intrinsic folding features that include some redundant and useless features. Therefore,it is essential to select the most representative feature subset,which contributes to improve the classification efficiency.We propose a novel feature selection method based on genetic algorithm.The information gain of feature...
It is known that Logistic Regression coupled with Partial Least Squares dimension reduction (PLSDR-LD) is capable of extracting a great deal of useful information for classification from gene expression profile and getting a rather high classification accuracy rate. In this study, we replace the logistic function of Logistic Regression with several functions which are similar to logistic function...
Mass spectrometry (MS) data has been widely analyzed for the detection of early stage cancers. Its potential for seeking proteomic biomarkers has received a great deal of attention in recent years. In the sparse representation classification (SRC) framework, a testing sample is represented as a sparse linear combination of training samples. The coefficient vector of representation is obtained by a...
In this paper we present a new approach for classification of microarray data. Our methodology consists of two steps: an attribute selection, which aims at selection of the most informative genes, and a classification of expression profiles, which is carried out by weighted voting, a novel instance-based classifier based on Rough Set Theory. Attribute selection consists of two stages - initial selection,...
High accuracy sequence classification often requires the use of higher order Markov models (MMs). However, the number of MM parameters increases exponentially with the range of direct dependencies between sequence elements, thereby increasing the risk of over fitting when the data set is limited in size. We present abstraction augmented Markov models (AAMMs) that effectively reduce the number of numeric...
We have recently found that the computation time of homology-based subcellular localization can be substantially reduced by aligning profiles up to the cleavage site positions of signal peptides, mitochondrial targeting peptides, and chloro-plast transit peptides [1]. While the method can reduce the profile alignment time by as much as 20 folds, it cannot reduce the computation time spent on creating...
The study was to compare principle component (PC) versus partial least square (PLS) regression, the former unsupervised and the latter supervised gene component analysis, for highly complicated and correlated microarray gene expression profile. Projection of derived classifiers into independent samples for clinical phenotype prediction was evaluated as well. Previous studies had suggested that PLS...
When proposing a new classification scheme, perhaps in the form of a classification rule or feature selection method, modelers in the bioinformatics literature typically report its performance on data sets of interest, such as gene-expression microarrays. These data sets often include thousands of features but a small number of sample points, which increases variability in feature selection and error...
The validity of a classifier depends on the precision of the error estimator used to estimate its true error. This paper considers the necessary sample size to achieve a given validity measure, namely RMS, for resubstitution and leave-one-out error estimators in the context of LDA. It provides bounds for the RMS between the true error and both the resubstitution and leave-one-out error estimators...
Support Vector Machine (SVM) is a useful technique for data classification with successful applications in different fields of bioinformatics, image segmentation, data mining, etc. A key problem of these methods is how to choose an optimal kernel and how to optimize its parameters in the learning process of SVM. The objective of this study is to propose a Genetic Algorithm approach for parameter optimization...
Much attention has been paid to the technically research and practical application of prediction of protein subcellular location since a great number of previous works by researchers proved the close relationship between protein function and its location as well as human genome project successfully completed over last decades. With rapid progress of computer's calculating speed, computational intelligence...
Hierarchical Multilabel Classification is a classification problem where the classes of the examples are hierarchically structured and, additionally, each example can simultaneously belong to two or more classes in the same hierarchical level. This paper proposes a new Top-Down classification method based on a label combination process, using Artificial Neural Networks as base classifiers. The experimental...
The task of finding transcription start sites (TSSs) can be modeled as a classification problem. Relevance vector machines (RVM) is a family of machine learning methods that represent a Bayesian approach to the training of general linear models (GLM). Based on the Markov-chain Monte Carlo(MCMC) sampler, propose a model for using the RVM to explore very large numbers of candidate features. The model...
Penalized feature selection and classification techniques are promising in bioinformatics studies of high-dimensional microarray data. The penalized objective function of penalization methods includes two parts: classification objective function and penalty terms. We propose a novel L1 + L1 model. The classification objective function is chosen as the negative log-likelihood function based on the...
O-glycosylation is one of the main types of the mammalian protein glycosylation, it occurs on the particular site of serine and threonine. It's important to predict the O-glycosylation site. In this paper, we propose a new method of kernel principal component analysis (KPCA) to predict the O-glycosylation site with window size w=9. The samples for experiment are encoded by the sparse coding and projected...
Protein classification plays an important role in the research in Bioinformatics. Many discriminative methods, including the SVM based algorithms are used to do this job. In order to use these methods, variable length protein sequences must be converted into fixed-length dimensional vectors. The current work presents a new method of converting sequences into vectors. The method first constructs profile...
Support vector machine (SVM) considers all data points with the same importance in classification problems, therefore SVM is very sensitive to noisy data or outliers. Current fuzzy approach to two-class SVM introduces a fuzzy membership to each data point in order to reduce the sensitivity of less important data, however computing fuzzy memberships is still a challenge. It has been found that the...
This paper presents an extension to the Rule-Based Similarity (RBS) model a novel rough set approach to the problem of learning a similarity relation from data. The original model, proposed in [1], applied the notion of Tversky's feature contrast model in a rough set framework to facilitate an accurate case-based classification. In the dynamic RBS model, a dynamic reducts technique is used to broaden...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.