The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
Long non-coding RNAs (lncRNAs) have been implicated in various biological processes, and are linked in many dysregulations. Researchers have reported large number of lncRNA associated human diseases over the past decade. In this article we employed the Non-negative Matrix Factorization method to develop a low-dimensional computational model that can describe the existing knowledge about lncRNA-disease...
Many artificial intelligence techniques have been developed to process the constantly increasing volume of data to extract meaningful information from it. The accurate annotation of the unknown protein using the classification of the protein sequence into an existing superfamily is considered a critical and challenging task in bioinformatics and computational biology. This classification would be...
In this study, instead of traditional approaches to virus classification, we proposed a novel approach in the vector space model for virus classification via two types of genome sequences, DNA and CDS. For DNA sequence, in this study, the k-mer approach was adopted for pattern extraction and the entropy of the pattern frequency distribution among classes was for pattern weighting. For CDS sequence,...
Identification of protein coding regions (exons) in eukaryotic genomic sequences is an active area of research at present. Mapping of symbolic genomic sequences to numeric sequences is the first step required for processing them using digital signal processing (DSP) tools. For DFT-based methods paired numeric and frequency of nucleotide are reported as the best mapping schemes. In this work performance...
Nonnegative matrix factorization is used extensively for feature extraction and clustering analysis. Recently many sparsity/sparseness constraints, such as L1 penalty, are introduced for sparse nonnegative matrix factorization. Inspired by sparsity measures from linear regression model, this paper proposes to integrate nonnegative matrix factorization with another sparsity constraint, the elastic...
Prior to applying the digital signal processing techniques for identification of protein coding regions, mapping of DNA alphabet into numerical sequences is necessary. In this paper, the performance of existing DNA to numerical mapping techniques is analyzed at the nucleotide level for the identification of protein coding regions using tapered window based short-time discrete Fourier transform (ST-DFT)...
The accuracy of methods based on power spectrum analysis depends on the threshold that is used to discriminate the coding and non-coding sequences. Due to gene structural differences of different organisms, we inferred that there is an optimal gene prediction threshold for each organism. To prove this, we analyzed real biological data, and found that there are indeed different optimal thresholds for...
O-glycosylation is one of the main types of the mammalian protein glycosylation, it occurs on the particular site of serine and threonine. It's important to predict the O-glycosylation site. In this paper, we propose a new method of kernel principal component analysis (KPCA) to predict the O-glycosylation site with window size w=9. The samples for experiment are encoded by the sparse coding and projected...
Automatic prediction of protein three-dimensional structures from its amino acid sequence has become one of the most important researched fields in bioinformatics. With that increases the importance of determining the quality of these protein models. Protein three-dimensional structure evaluation is a complex problem in computational structure biology. We attempt to solve this problem using SVM and...
Next generation sequencing is quickly changing long standing paradigms of genomics in terms of what is feasible to accomplish within a ldquoresearch life timerdquo and what is supposed to remain beyond limits of reliable experimental analysis. Sequencing and mapping of a prokaryote transcriptome can provide experimental validation for computationally predicted genes annotated in a prokaryotic genome...
Genomes of many organisms have been sequenced over the last few years. However, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed to address part of this problem: the location of genes along a genome. We propose a multiobjective methodology to combine algorithms into an aggregation scheme in order to obtain optimal methods'...
In recent years, many studies have focused on improving the accuracy of prediction of trans-membrane segments, and many significant results have been achieved. In spite of these considerable results, the existing methods lack the ability to explain the process of how a learning result is reached and why a prediction decision is made. The explanation of the decision process is important for acceptance...
We present a paper for the prediction of the bindings between microRNAs (miRNAs) and their target genes. A novel coding for the miRNAs, the binding sites (i.e. the target genes) and the flanking sequences of the binding sites is adopted to code the related information comprehensively. A feature selection method, Minimum Redundancy Maximum Relevance (mRMR), is used to filter out ineffective and redundant...
Support vector machines (SVMs) are known to be excellent algorithms for classification problems. The principal disadvantage of SVMs is due to its excessive training time in large data set, such as DNA sequences. This paper presents a novel SVMs classification method which reduces significantly the input data set using Bayesian technique. Using this system, we are able to predict with a high accuracy...
Recognition of coding sequences in a complete genome is animportant problem in DNA sequence analysis. Their rapid and accurate recognition contributes to various relevant research and application. In this paper, we aim to distinguish the coding sequences from the non-coding sequences in a prokaryote complete genome. We select a data set of 51 available bacterial genomes. Then, we use the global descriptor...
The protein secondary structure (PSS) prediction system presented in this paper is a subsystem of potato bioinformation research platform. The proposed method is a novel and practical PSS prediction method, which is based on nucleic acid sequence (NAS), uses an combined neural network (CNN) and takes an improved genetic algorithm (GA) to optimize the connection weights of CNN. The experimental results...
The theory and methods of signal processing are becoming increasingly important in bioinformatics and systems biology. The ordinary Fourier analysis is satisfactory for the long DNA sequences to detect period-3 property, but is without impressive success for the short DNA sequences. An improved Fourier method is proposed to increase the accuracy of gene identification by amplifying period-3 behavior...
In this paper, a prediction method of protein contact on the basis of information granules and RBF neural network have been brought forward. This method improved the encoding approach of protein structure data and classifier performance to enhance the predicting accuracy of protein contact. 200 nonhomologous proteins from the PDB database were encoded according to the encoding approach and were taken...
Due to the enormous amount of data in DNA sequences to be processed, the computational complexity and speed are important issues to be considered. In this paper, a new integrative method is presented for predicting protein coding regions. We first establish a Takagi-Sugeno fuzzy model to identify the first nucleotide of a codon in coding regions, then the time-frequency characteristics of the output...
The accurate recognition of translation initiation sites (TISs) is an important stage in genome annotation. Due to the complicated nature of the genetic information and our incomplete understanding of it, TIS prediction remains a challenging undertaking. Many computational approaches have been proposed in the literature, some of which have yielded quite impressive performance. However, most of them...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.