The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The task of finding transcription start sites (TSSs) can be modeled as a classification problem. Semi-Supervised Support Vector Machines (S3VMs) are an appealing method for using unlabeled data in classification. Based incorporation prior biological knowledge for recognizing TSSs, propose a Self-Training S3VMs (ST-S3VMs) algorithm. ST-S3VM builds a SVM classifier based small amounts of labeled data...
Finding the location of binding sites in DNA is a difficult problem. Although the location of some binding sites have been experimentally identified, other parts of the genome may or may not contain binding sites. This poses problems with negative data in a trainable classifier. Here we show that using randomized negative data gives a large boost in classifier performance when compared to the original...
Improvements in automated urinalysis are largely requested by laboratory practice. Urine samples with noise and imbalance increase the difficulty of identifying and classifying urine-related diseases. For improving classification performance, this paper compared the effectiveness of several learning classifiers and proposed a hybrid sampling-based ensemble learning method. The experiments show that...
Precursor miRNAs (pre-miRNAs) are usually extracted to obtain quite a lot of global and intrinsic folding features that include some redundant and useless features. Therefore,it is essential to select the most representative feature subset,which contributes to improve the classification efficiency.We propose a novel feature selection method based on genetic algorithm.The information gain of feature...
Protein structural class prediction can play a vital role in protein 3-D structure prediction by reducing the search space of 3-D structure prediction algorithms. In this paper we used support vector machine to predict protein structural class solely based of its amino acid sequences, i.e. mainly α, mainly β, α- β and fss from CATH protein structure database; all-α, all-β, α/β, α+β from SCOP protein...
The objective of the current work is to develop an automatic tool to identify microbiological data types using computer vision and pattern recognition. Current systems rely on the subjective reading of profiles by a human expert. This process is time-consuming and prone to errors. Bacteriophage (phage) typing & Fluorescent imaging methods are used to extract representative feature profiles and...
Optimally combining available information is one of the key challenges in knowledge-driven prediction techniques. In this study, we evaluate six Phi and Psi-based backbone alphabets. We show that the addition of predicted backbone conformations to SVM classifiers can improve fold recognition. Our experimental results show that the inclusion of predicted backbone conformations in our feature representation...
This paper reports the investigations and experimental procedures conducted for designing an automatic sleep classification tool basedconly in the features extracted with wavelets from EEG, EMG and EOG (electro encephalo-mio- and oculo-gram) signals, without any visual aid or context-based evaluation. Real data collected from infants was processed and classified by several traditional and bio-inspired...
The validity of a classifier depends on the precision of the error estimator used to estimate its true error. This paper considers the necessary sample size to achieve a given validity measure, namely RMS, for resubstitution and leave-one-out error estimators in the context of LDA. It provides bounds for the RMS between the true error and both the resubstitution and leave-one-out error estimators...
This paper studies the suitability of Extreme Learning Machines (ELM) for resolving bioinformatic and biomedical classification problems. In order to test their overall performance, an experimental study is presented based on five gene microarray datasets found in bioinformatic and biomedical domains. The Fast Correlation-Based Filter (FCBF) was applied in order to identify salient expression genes...
Support Vector Machines (SVMs) ensembles have been widely used to improve classification accuracy in complicated pattern recognition tasks. In this work we propose to apply an ensemble of SVMs coupled with feature-subset selection methods to aleviate the curse of dimensionality associated with expression-based classification of DNA microarray data. We compare the single SVM classifier to SVM ensembles...
Disulfide bonds play the key role for predicting the three-dimensional structure and the function of a protein. In this paper, we propose an algorithm for predicting the disulfide bonding state of each cysteine in a protein sequence. This method is based on the multi-stage framework and the multi-classifier of the support vector machine. We also design a new training strategy to increase the prediction...
We apply and compare a random Bayes forest classifier and three traditional classification methods to a dataset of complex benthic macroinvertebrate images of known taxonomical identity. Since in biomonitoring changes in benthic macroinvertebrate taxa proportions correspond to changes in water quality, their correct estimation is pivotal. As classification errors are passed on to the allocated proportions,...
The SVM (Support Vector Machine) is superior to other artificial neural network (such as the BP network) in classification. And its rapid development and the wide application are due to the introduction of the concept of soft margin. However, the traditional soft margin SVM gives the same misclassification costs for the various sample data, thus the processing results of the real data are not satisfactory...
This work presents a system for knowledge discovery from protein databases, based on an Artificial Immune System. The discovered rules have the advantage of representing comprehensible knowledge to biologist users. This task leads to a very challenging problem since a protein can be assigned multiple classes (functions or Gene Ontology (GO) terms) across several levels of the GO's term hierarchy....
DNA microarray data is a challenging issue for machine learning researchers due to the high number of gene expression contained and the small samples sizes. To deal with this problem, feature selection methods, such as filters and wrappers, are typically applied to reduce the dimensionality. In this work, we apply a filter method before the classification and include a discretization step. The results...
Metagenomic studies inherently involve sampling genetic information from an environment potentially containing thousands of distinctly different microbial organisms. This genetic information is sequenced producing many short fragments (<;500 base pair (bp)); each is tentatively a small representative of the DNA coding structure. Any of the fragments may belong to any of the organisms in the sample,...
Base-calling is one of many problems that can be solved using pattern recognition, the act of classifying raw data based on prior or statistical information extracted from the data into various classes. In this paper, we propose a new framework using polynomial classifiers to model electropherogram traces obtained from ABI sequencing machines to perform base-calling. Initially, pre-processing, which...
Influenza viruses continue to evolve rapidly and are responsible for seasonal epidemics and occasional, but catastrophic, pandemics. We recently demonstrated the use of decision tree and support vector machine methods in classifying pandemic swine flu viral strains with high accuracy. Here, we applied the technique of artificial neural networks for the prediction of important influenza virus antigenic...
Side effect machines operate by associating side effects with the states of a finite state machine. The use of side effect machines permits the researcher to leverage information stored in the state transition structure, making machines that might be identical as recognizers behave differently as classifiers. The side effect machines in this study associate a counter with each state so that the number...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.