The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The goal of a biometric identification system is to determine the identity of the input biometric data. In such a system, the input probe (e.g., a face image) is compared against the labeled gallery data (e.g., face images in a watch-list) resulting in a set of ranked scores pertaining to the different identities in the gallery database. The identity corresponding to the best score is then associated...
Enzyme family prediction is extensively used to identify new family members. It is well known that enzyme function is strongly related to its structure. In this work, we proposed a novel approach, CM-HMM, to predict enzyme subfamily which yielded high accuracy when sequence similarity is less than 30%. Moreover, it provided descriptive information for 3-dimensional analysis. Our method used information...
We apply active learning and logistic regression to perform statistical analysis of Mascot peptide identification.Uncertainty sampling is used to select examples for labeling, and selected examples are labeled with reference data as the oracle. In each iteration of active learning, the penalized Newton-Raphson method is used to solve the logistic regression model. By testing the method on two datasets...
Research in Protein Interactions has revealed that these interactions can be considered in the form of a network and that there exists proteins with various degrees of connectivity in the network. In this paper a set of proteins sequences are treated as a system of linear equations to see whether there exist any relation between the degree of connectivity and the amino acid frequencies of protein...
Metagenomics is the study of environmental samples. Because few tools exist for metagenomic analysis, a natural step has been to utilize the popular homology tool, BLAST, to search for sequence similarity between DNA reads and an administered database. Most biologists use this method today without knowing BLAST's accuracy, especially when a particular taxonomic class is under-represented in the database...
The World Wide Web has grown to become one of the most pervasive and comprehensive information repositories available today and many compare the knowledge contained within it to a modern day library of Alexandria. Yet, despite its vastness, one of the downsides to using Web-based information sources is that the information contained in most Web pages has never been reviewed for accuracy or quality...
The functions of proteins are closely related to their subcellular locations. In the post-proteomics era, the amount of gene and protein data grows exponentially, which necessitates the prediction of subcellular localization by computational means. This paper proposes mitigating the computation burden of alignment-based approaches to subcellular localization prediction by using the information provided...
De novo peptide sequencing is one of the most challenging topics in the field of computational proteomics. In this manuscript, a novel method based on virtual database searching is presented to improve the performance of de novo sequencing for the data from high resolution LTQ-FT mass spectrometry. Our method directly generates a virtual database from each spectrum and applies a search engine to match...
The Gene Ontology provides a controlled vocabulary to unify the presentation of gene and gene product attributes across species and genomes. It is widely used in biological data analysis and supported by popular biological databases. How to measure the relationship between GO terms has become a hot topic nowadays. In this paper, we propose a new method to measure the semantic similarity between Gene...
Sample Entropy (SampEn) is a nonlinear regularity index that requires the a priori selection of three parameters: the length of the sequences to be compared, m, the patterns similarity tolerance, r, and the number of samples under analysis, N. Appropriate values for m, r and N have been recommended in some cases, such as heart rate, hormonal data, etc., but no guidelines exist for the selection of...
This paper presents a novel discriminant analysis (DA) for feature extraction using mutual information (MI) and Fisher discriminant analysis (MI-FDA). Most DA algorithms for feature extraction are based on a transformation which maximizes the between-class scatter and minimizes the within-class scatter. In contrast, the proposed method uses the Fisher's criterion to find a transformation that maximizes...
Neural network is one of the successful methods for protein secondary structure prediction. Day to day this technology is modified, improved, even other methods also combined with it to get better result. In this paper we trained feed-forward neural network with trans-membrane protein for helix prediction. Using Java object oriented neural engine (JOONE) our achieved accuracy is 71%. This paper is...
The amount of data produced by the several genomic sequencing projects has increased dramatically in recent years. One of the main goals of bioinformatics is to analyze biological data aiming at identifying genes. The splice junction recognition problem is an important part of the gene detection problem. This work evaluates the performance of two classification models, derived from the weight matrix...
In this paper, a prediction method of protein contact on the basis of information granules and RBF neural network have been brought forward. This method improved the encoding approach of protein structure data and classifier performance to enhance the predicting accuracy of protein contact. 200 nonhomologous proteins from the PDB database were encoded according to the encoding approach and were taken...
In this study, we investigated a number of data analysis methods to discover useful genomic data for predicting protein function. Such methods is data mining prediction (DMP). DMP is based on a combination of evidence from the attributes of amino acid, its predicted structure, and its phylogenic patterns. DMP is, to the best of our knowledge, the first non-SIM based prediction method to have been...
The study of transcriptional regulation mechanisms is one of current research issues in post-genomic era. However, the number of the known regulatory elements is rarely limited, and the accuracy of the state-of-the-art identification methods is still far from satisfactory. Therefore effective information, personalized service are critical features to be provided in a system, but the existing systems...
In this paper, we report systematic in depth analysis of 54 known pre-miRNA from Apis mellifera (honey bee) with a set of 14 attributes. We have derived this set of attributes from secondary structure data that are generated from pre-miRNA sequences from Apis meillfera database using RNAfold. Principal component analysis method has been applied for dimension reduction. It reduces dimension of this...
Classifying chemical compounds is an active topic in drug design and other cheminformatics applications. Graphs are general tools for organizing information from heterogeneous sources and have been applied in modelling many kinds of biological data. With the fast accumulation of chemical structure data, building highly accurate predictive models for chemical graphs emerges as a new challenge . In...
Gene ontology (GO) annotation is a controlled vocabulary of terms and phrases describing the function of genes and gene products, which has been succeeded in predicting subcellular and subnuclear localization. Generally, each gene product is annotated by very few GO terms from more than 25,000 annotations available at present. How to represent a protein sequence using GO terms as features plays an...
One of the most famous approaches for the segmentation of color images is finding clusters in the color space. Shapes of these clusters are often complex and the time complexity of the existing algorithms for finding clusters of different shapes is usually high. In this paper, a novel clustering algorithm is proposed and used for the image segmentation purpose. This algorithm distinguishes clusters...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.