The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
ATP is a ubiquitous nucleotide that provides energy for cellular activities, catalyzes chemical reactions, and is involved in cellular signaling. The knowledge of the ATP-protein interactions helps with annotation of protein functions and finds applications in drug design. We propose a high-throughput machine learning-based predictor, ATPsite, which identifies ATP-binding residues from protein sequences...
DNA-binding proteins perform their functions through specific or non-specific sequence recognition. Although many sequence-based approaches have been proposed to identify DNA-binding residues on proteins or protein-binding sites on DNA sequences with satisfied performance, it remains a challenging task to unveil the exact mechanism of protein-DNA interactions without crystal complex structures. Without...
Nucleosome, a nucleoprotein structure formed by coiling 147bp of DNA around an octamer of histone proteins, is the fundamental repeating unit of eukaryotic chromatin. By regulating the access of biological machineries to underlying \textit{cis}-regulatory elements, its mobility has been implicated in many important cellular processes. Although it has been known that various factors, such as DNA sequences,...
As a crucial step for the other tasks, such as human gene/protein normalization, relationship extraction and hypothesis generation, biomedical named entity recognition remains a challenging task. This paper represents a two-phase approach based on semi-CRFs and novel feature sets. Semi-CRFs put the label to a segment not a single word which is more natural than the other machine learning methods....
Location proteomics is concerned with the systematic analysis of the subcellular location of proteins. In order to perform comprehensive analysis of all protein location patterns, automated methods are needed. With the goal of extending automated subcellular location pattern analysis methods to high resolution images of tissues, 3D confocal microscope images of polarized CaCo2 cells immunostained...
With the accelerating advancement of biomedical research, it has been widely accepted that genetic variation plays a critical role in the pathogenesis of human inherited diseases. As an important type of genetic variation, nonsynonymous single nucleotide polymorphisms (nsSNPs) that occur in protein coding regions lead to amino acid substitutions in proteins, affecting structures and functions of proteins,...
Transcriptional factor binding site (TFBS) motifs on DNA genomes play important functional roles in gene expression and regulation. Accurately identifying the motifs is thus an important problem in bioinformatics. However, exhaustively enumerating all possible locations for a motif in a set of sequences is computationally intractable. Many heuristic or approximation algorithms and machine learning...
Post-translational modifications pegged on to the N-terminal tails of the nucleosomes core histone proteins determine the transcriptional activity of that chromosomal region leading to the histone code hypothesis. We rely on recently produced experimental data on genome-wide maps of chromatin state to derive computational models delineating the hidden patterns of post-translational modifications....
The following topics are dealt with: artificial intelligence and machine learning in bioinformatics; RNA design, RNA secondary structure prediction algorithms; sequence and phylogenetic analysis; genetic algorithm; DNA; protein and RNA structure prediction, folding, and docking; artificial neural network; toxins; protein-protein interactions; gene finding and microarray analysis; very large biological...
A new machine learning approach has been developed in this study for sequence-based prediction of DNA-binding residues in proteins. The approach used both the labeled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented...
DNA-binding proteins play an important role in various intra- and extra-cellular activities. The key in the protein is DNA-binding region also called DNA-binding domain (DBD). However, it is hard to search the DBDs by means of homology search or hidden Markov models because of a wide variety of the sequences. In this work, we develop a kernel-based machine learning method by combination of multiple...
PIL5 is a member of the bHLH transcription factor super family and plays crucial roles in phytochrome mediated seed germination process in Arabidopsis. While our previous study shows that PIL5 binds to the G-box (CACGTG) motif with high affinity, other attributes must be involved in determining regulatory specificity. In this study we performed a ChIP-chip assay to obtain genome-wide PIL5 binding...
A hybrid approach combining the self-organizing map (SOM) and the hidden Markov model (HMM) is presented. The self-organizing hidden Markov model map (SOHMMM) establishes a cross-section between the theoretic foundations and algorithmic realizations of its constituents. The respective architectures and learning methodologies are blended together in an attempt to meet the increasing requirements imposed...
Sex steroid hormones receptors bind to regions of the DNA called hormone response elements (HREs), in order to facilitate the regulation of gene expression. While the biological, functional and molecular basis of this interaction between the response elements and their corresponding transcription factor is not fully understood, the sequences of these HREs are known to be conserved for certain nucleotides...
We have investigated strategies for enhancing ensemble learning algorithms for the analysis of high-dimensional biological data. Specifically we investigated strategies to force classifiers to consider the possible interactions between features. As a result an algorithm that induces decision trees with a feature non-replacement mechanism has been devised and tested on DNA microarray and proteomic...
Discovering the "recognition code" governing protein-DNA interaction has been an important topic for decades in bioinformatics. While other studies have focused on analyzing the frequency of amino acid-base contacts, this study here attempts to discover the structural and physicochemical features of proteins that determine the specificity of amino acid-base contacts. For each amino acid...
Bioinformatics is the computing response to the molecular revolution in biology. This revolution has reshaped the lift sciences and given us a deep understanding of DNA sequences, RNA synthesis and the generation of proteins. This process can be represented as gene expression of molecular autoregulatory feedback loop systems. In this paper, the annealing robust fuzzy basis function (ARFBF) is proposed...
Advent of high-throughput sequencing technology has led to an exploration of DNA sequence data available. Structures and functions of protein sequence coded for by sequenced genomes remain largely unknown. Automated identification of protein functions and interactions have been largely relying on the known 3D structures or sequence homologues. In particular, intrinsic unstructured or disordered proteins...
Annotation of the functional sites on the surface of a protein has been the subject of many studies. In this regard, the search for attributes and features characterizing these sites is of prime consequence. Here, we present an implementation of a kernel-based machine learning protocol for identifying residues on a DNA-binding protein form the interface with the DNA. Sequence and structural features...
We describe new binary algorithm for the prediction of alpha and beta protein folding types from RNA, DNA and amino acid sequences. The method enables quick, simple and accurate prediction of alpha and beta protein folds on a personal computer by means of few binary patterns of coded amino acid and nucleotide physicochemical properties. The algorithm was tested with machine learning SMO (sequential...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.