The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
With the rapid development of information technology, the number of datasets, as well as their complexity and dimension, have been growing dramatically. This dramatic growth of biology data and non-biological commercial databases becomes a challenging issue in data mining. Classification technique is one of the major tools in the captured research area. However, the performance of classification may...
The concept of linear separability of gene expression data sets with respect to two classes, has been recently studied in literature. The problem is to efficiently find all pairs of genes which induce a linear separation of the data. It has been suggested that an underlying molecular mechanism relates together the two genes of a separating pair to the phenotype under study, such as a specific cancer...
RNA-seq produces detailed information including length, strand and pairing states, which can be leveraged to characterize RNA functional categories using machine-learning approaches. Using fruit fly small-RNA-seq data, we demonstrate that by combining read length correlation with multi-class classifier models, we can classify four non-coding RNA function classes with high precision.
Accurately modeling the DNA sequence preferences of transcription factors and predicting their genomic binding sites are key problems in regulatory genomics. These efforts have long been frustrated by the limited availability and accuracy of TF binding site motifs. Today, protein binding microarray (PBM) experiments and chromatin immunoprecipitation followed by sequencing (ChlP-seq) experiments are...
Many attempts have been carried out to deal with missing values (MV) in microarrays data representing gene expressions. This is a problematic issue as many data analysis techniques are not robust to missing data. Most of the MV imputation methods currently being used have been evaluated only in terms of the similarity between the original and imputed data. While imputed expression values themselves...
B-cell epitopes play an important role for developing synthetic peptide vaccines and inducing antibody responses. Applying biological experiments for epitope identification is time consuming and demands a lot of experimental resources. Nevertheless, it is important yet challenging task for designing a computer-aided B-cell linear epitope prediction system with high precision rates. In this paper,...
Selection of reliable genes from micro array gene expression data is essential to carry out a diagnostic test and successful treatment. In this regard, a rough set based gene selection algorithm is developed recently to select genes from micro array data. In this paper, a fuzzy discretization method is proposed for rough set based gene selection algorithm to compute relevance and significance of continuous...
The combination of local features, complementary feature types, and relative position information has been successfully applied to many object-class recognition tasks. Stacking is a common classification approach that combines the results from multiple classifiers, having the added benefit of allowing each classifier to handle a different feature space. However, the standard stacking method by its...
The task of finding transcription start sites (TSSs) can be modeled as a classification problem. Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification. Based incorporation prior biological knowledge for recognizing TSSs, propose a Self-Training S3VMs (ST-S3VMs) algorithm. ST-S3VM builds a SVM classifier based small amounts of labeled data...
Finding the location of binding sites in DNA is a difficult problem. Although the location of some binding sites have been experimentally identified, other parts of the genome may or may not contain binding sites. This poses problems with negative data in a trainable classifier. Here we show that using randomized negative data gives a large boost in classifier performance when compared to the original...
In the domain of agricultural robotics, one major application is crop scouting, e.g., for the task of weed control. For this task a key enabler is a robust detection and classification of the plant and species. Automatically distinguishing between plant species is a challenging task, because some species look very similar. It is also difficult to translate the symbolic high level description of the...
Identifying the ion types for a mass spectrum is essential for interpreting the spectrum and deriving its peptide sequence. In this paper, we proposed a novel method for identifying ion types and deriving matched peptide sequences for tandem mass spectra. We first divided our dataset into a training set and a testing set and then preprocessed the data using a Support Vector Machine and a 5-fold cross...
The quantitative structure-activity relationships (QSAR) studies on toxicity of 91 organic compounds to Chlorella vulgaris have been performed by using ν-support vector machine(ν-SVM) algorithm and taking the 2D-autocorrelation descriptors as the structural parameters based on variable selection with particle swarm optimization(PSO) methed. The correlation coefficient(R2) and Qcv2 of PSO-ν-SVM model...
MicroRNAs (miRNAs) are known to regulate transcription and/or protein translation of hundreds of genes. Despite their importance, the functions of most human miRNAs are still poorly understood. In this paper, we proposed a SVM based algorithm - PathMicrO that elucidates the miRNA function by predicting the miRNA regulated pathways. PathMicrO combines the sequence-level target predictions with the...
Applications of next-generation sequencing technologies have the potential to bring revolutionary changes to medicine and biology. However, coverage bias can pose a challenge to short read data analysis tools, which rely on high coverage. To address this issue we have developed a support vector machine (SVM) based method for predicting low coverage prone (LCP) regions on a given genome. The developed...
Protein structural class prediction can play a vital role in protein 3-D structure prediction by reducing the search space of 3-D structure prediction algorithms. In this paper we used support vector machine to predict protein structural class solely based of its amino acid sequences, i.e. mainly α, mainly β, α- β and fss from CATH protein structure database; all-α, all-β, α/β, α+β from SCOP protein...
A key step in the development of an adaptive immune response to vaccines is the binding of peptides to molecules of the Major Histocompatibility Complex (MHC) for presentation to T lymphocytes, which are thereby activated. Several algorithms have been proposed for such binding predictions, but are limited to a small number of MHC molecules and present imperfect prediction power. We are undertaking...
As a growing number of protein structures are resolved without known functions, using computational methods to help predict protein functions from the structures becomes more and more important. Some computational methods predict protein functions by aligning to homologous proteins with known functions, but they fail to work if such homology cannot be identified. In this paper we classify enzymes/non-enzymes...
This paper proposes a local linear multi-SVM method based on composite kernel for solving classification tasks in gene function prediction. The proposed method realizes a nonlinear separating boundary by estimating a series of piecewise linear boundaries. Firstly, according to the distribution information of training data, a guided partitioning approach composed of separating boundary detection and...
Analysis of fertile material such as flowers and fruit is a key factor in the proper identification of plant species. Despite object recognition being a mature research area, the use of it in automated plant identification is still relatively new. This paper describes a novel method of detecting fertile material in plant images using rectangular features. Rectangular features are obtained for the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.