The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
A set of protein pairs predicted to be interacting with high ratio of true positive is valuable for target selection in experiments like protein structure determination. Our goal in this paper is to investigate the problem of finding such a set of protein pairs in an organism by machine learning methods. Yeast genome was taken for this study and support vector machine was adopted as the classification...
Structural genomics is an international effort to determine the three-dimensional shapes of all important biological macromolecules, with a primary focus on proteins. Target proteins should be selected according to a strategy that is medically and biologically relevant, of good financial value, and tractable. In 2003, we presented the "Pfam5000" strategy, which involves selecting the 5,000...
The identification of protein-protein interactions along with their spatial and temporal localization is vital data for assigning functional information to proteins. Historically, these data sets obtained from fluorescence microscopy, have been analyzed manually, a process that is both time consuming and tedious. The development of an automated system that can measure the location dynamics of the...
A new method based on support vector regression (SVR) has been introduced to predict the relative solvent accessibility (RSA) of residues from protein primary sequences, which uses the local information of protein primary sequences as input. Different to most previous methods which are designed to predict the exposure state (exposed/buried, exposed/intermediate/buried, etc) of a particular residue...
Annotation of the functional sites on the surface of a protein has been the subject of many studies. In this regard, the search for attributes and features characterizing these sites is of prime consequence. Here, we present an implementation of a kernel-based machine learning protocol for identifying residues on a DNA-binding protein form the interface with the DNA. Sequence and structural features...
Protein sequence alignments reveal the evolutionary information between homologous sequences. Traditional sequence alignment methods only use sequence information and the structure information from template is ignored. Recently, Kleinjung et al. developed a contact-based sequence alignment method that used the structural information from side-chain contacts. Alignment scores are provided by the CAO...
Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological functions. In this paper, all motifs in PROSITE were examined and those that are indicative to eukaryotic protein subcellular localizations were picked out. A corresponding motif module was built and combined to our former work: LOCSVMPSI...
Proteolytic processing occurs predominantly at basic amino acid residues. The existence of the cleavage sites not recognized by rules proposed in previous studies prompts us to test whether, and to what extent, the sites cleave. Due to the imbalanced cleavage site database from SWISS, Smote combined with Tomek links is applied to over-sample the data. A neural network method is then developed to predict...
This paper presents an application of neural networks in location of the copper-binding sites of metalloprotein. Using annotated metalloprotein downloaded from PDB, sequences including copper-binding sites were extracted. By further finding the particular core segments of copper-binding sites, the input and output information for training is polished. Moreover, this paper investigates the number of...
Multiple sequence alignment is a central topic of extensive research in computational biology. Basically, two or more protein sequences are compared so as to evaluate their similarity. This work reports a methodology for parallel processing of a multiple sequence alignment algorithm (ClustalW) in an environment of networked computers. A detailed description of the modules that compose the distributed...
In this paper, we present the design and implementation of a protein structure data and analysis system that is only used in the lab for analyzing the proprietary data. It is capable of storing public protein data, such as the data in Protein Data Bank (PDB) Berman et al., (2000), and life scientists' proprietary data. This toolkit is targeted at life scientists who want to maintain proprietary protein...
Subcellular location of a protein is one of the key functional characters as proteins must be localized correctly at the subcellular level to have normal biological function. In this work, a novel hybrid-classifier prediction method has been introduced, which uses evolutionary information and sequence-order information to improve prediction performance. Prediction results on different data sets show...
Tandem mass spectrometry followed by data base search is the preferred method for protein identification in high throughput proteomics. However, standard analysis methods give rise to highly redundant lists of proteins with many proteins identified by the same sets of peptides. In essence, this is a list of all proteins that might be present in the sample. Here we present an algorithm that eliminates...
The reverse engineering paradigm is given increasing attention in computational molecular biology lately. One of the goals is to understand how gene regulatory networks (complex systems of genes, proteins and other molecules) function and interact to carry out specific cell functions. We present an approach for inferring the complex causal relationships among genes from microarray experimental data...
Haematopoietic cytokines are important in the regulation of haematopoiesis and immune responses, and they can also influence lymphocyte development. Hundreds of members of several different cytokines families have been discovered by some different methods. But fast evolution rate and low similarity of cytokines prevent identifying novel members of a cytokine family completely with classical tools...
Accelerating availability of protein sequences and structures has transformed both the theory and practice of computational biology. The current systems of nomenclature for proteins remain divergent even when the experts appreciate the underlying similarities. Interoperability of protein databases is limited to lack of progress in the way the biologists describe and conceptualize the shared biological...
In this paper a new method that uses latent semantic analysis (LSA) to denote a protein sequence is proposed for researching the protein classification problem. A protein is vectorized according to its content of biological words: patterns and motifs, which are generated by utilizing TEIRESIAS algorithm and MEME/MAST system respectively. More precise description vectors of proteins are obtained through...
Protein domain-domain interaction pairs supply functional information about the interacting proteins; and finding interaction motif pairs in protein-protein interaction database can deeply disclose the essence of the protein interaction. Up to now, there is little research work on prediction of interaction motif pairs within domain-domain interaction pairs. In this paper, we propose a new method to...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.