The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The computation of semantic similarity of terminologies in gene ontology is an important application of gene ontology. Moreover, it is an important approach for biologists to deal with the semantic heterogeneity of biological data sets. Existing approaches in this field are based on semantic distance or information quantity. The author proposes a comprehensive approach where the similarity is first...
MRMPath, a system which processes protein sequences to create theoretical masses of peptide and their peptide fragments following digestion of the protein was developed. Experimentally, these fragments are created using enzymes prior to being processed in a mass spectrometer. In MRMPath, these fragments are created theoretically. MRMPath can process protein sequences from three different sources:...
Since existing high-throughput sequencing systems are originally designed for a single genome assembly, they cannot distinguish and simultaneously assemble multiple closely related sequences as well as estimate their relative abundances. This paper presents a novel approach in ViSpA software for quasispecies spectrum reconstruction. On simulated data, ViSpA accurately reconstructs up to 29 (out of...
Multiple sequence alignment is a basic of sequence analysis. In the development of multiple sequence alignment (MSA) approaches, M-Coffee [1] was proposed as a meta-method for assembling outputs from different individual multiple aligners into one single MSA to boost the accuracy. Authors showed that M-Coffee outperformed individual alignment methods. In this paper, we propose an improvement of M-coffee,...
Combining advance mathematic model to predict protein structure is one of the most challenging problems in structural biology. Condition Random Fields(CRF) is shown a powerful algorithm by many examples of informatics and widely used in protein structure predicted. CRFsampler can automatically optimizes more than ten thousand parameters quantifying the relationship among primary sequence and backbone...
Selecting informative genes from microarray gene expression data is the most important task while performing data analysis on the large amount of data. Mining genes having regulatory relations within thousands of genes is essential. To fit this need, a number of methods were proposed from various points of view. However, most existing methods solely focus on gene expression values themselves without...
We have cloned and expressed a novel earthworm fibrinolytic enzyme (EFE) of Lumbricus rubellus in Pichia pastoris. Its cDNA sequence (GenBank Accession No. DQ202401) revealed a 738 bp region containing an intact ORF that encodes a protein of 245 amino acid residues, containing a signal peptide of 7 amino acid residues and a mature peptide of 238 amino acid residues, designated as EFE F238. Its cDNA...
Quantifying residue variability at each column in a multiple sequence alignment of amino acids helps in indicating their similarities, and is useful to highlight information about the significances of each position from the perspective of their structure, function, and evolution. It is becoming increasingly clear that the groups of amino acids that allow conserved replacement vary with the position...
Abstract-An open reading frame (ORF) containing 675 nucleotides was got by sequencing the DEV gene libraries constructed by our laboratory. The ORF was identified as DEV UL45 gene by aligning with gene bank database using the software of BlastN and ORF Finder. Specific primers were designed and PCR products containing this ORF were cloned into the vector of PMD18-T. Dot blot confirmed the UL45 gene...
This paper presents a novel approach, namely SSVS, to improve the secondary structure prediction of proteins. In this work, a Radial Basis Function Neural Network is trained to combine different answers found by different secondary structure prediction techniques to produce superior answers. SSVS is tested with three of the well-known benchmarks in this field. The results demonstrate the superiority...
It is apparent that the challenges facing scientific software developers are quite different from those facing their commercial counterparts. Among these differences are the challenges posed by the complex and uncertain nature of the science. There is also the fact that many scientists have experience of developing their own software, albeit in a very restricted setting, leading them to have unrealistic...
The cell functions and development are regulated by complex networks of genes, proteins and other components by means of their mutual interactions. These networks are called gene regulatory networks (GRNs). GRNs are used to reveal the fundamental gene regulatory mechanisms, to determine the reasons for many diseases and interactions between drugs and their targets. The introduction of experimental...
In this paper, we describe the neural grammar network (NGN) and its application to quantitative structure-activity relationship (QSAR) in computational chemistry. The NGN is a novel machine learning device that applies the generic function approximation capability of a dynamic recursive neural network to the syntactic structure of a parsed string. In our QSAR task, we represent each molecule by a...
A computational mutagenesis methodology utilizing a four-body, knowledge-based, statistical contact potential is applied toward quantifying sequence-structure compatibility changes in bacteriophage T4 lysozyme upon single amino acid replacements. We show that these scalar scores correlate with experimentally measured stability changes to the protein due to the mutations. For each mutant, the approach...
The distribution, directionality and motility of the actin fibers control cell shape, affect cell function and are different in cancer versus normal cells. Quantification of actin structural changes is important for further understanding differences between cell types and for elucidation of the effects and dynamics of drug interactions. We have developed an image analysis framework for quantifying...
O-glycosylation of the mammalian protein is studied. It is serine or threonine specific, though any consensus sequence is still unknown. We have been applied support vector machines (SVM) for the prediction of O-glycosylation sites from various kinds of protein information, aiming to investigate a glycosylation condition and elucidate the mechanisms. In the present study, we focus on the distribution...
Causal structure discovery is an important problem in protein sequences and gene-gene interaction in gene expression data, which will reveal the elementary structure of the protein sequence and the gene-gene interaction by the expression level of genes within the cell. In this paper, we investigate the feature--based causal structure learning methods for protein sequence and gene expression data respectively...
We provide a unified description of (weighted) alpha shapes, beta shapes and the corresponding simplicial complexes. We discuss their applicability to various protein-related problems. We also discuss filtrations of alpha shapes and touch upon related persistence issues.We claim that the full potential of alpha-shapes and related geometrical constructs in protein-related problems yet remains to be...
From the perspective of topology, the native structure of a protein molecule can be represented as a complex network. In the network, amino acids are vertices and the interactions between them act as edges. The networks of 80 proteins that belong to four structural classes are constructed at three length scales: Protein Contact Networks, Long-range Interaction networks and Short-range Interaction...
HMMer is a widely-used bioinformatics software package that uses profile HMMs (Hidden Markov Models) to model the primary structure consensus of a family of protein or nucleic acid sequences. However, with the rapid growth of both sequence and model databases, it is more and more time-consuming to run HMMer on traditional computer architecture. In this paper, the computation kernel of HMMer, P7Viterbi,...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.