The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
One of the major goals in microarray data analysis is to identify biomarkers and build a classification model for future prediction. Many traditional statistical models, based on microarray data alone, often fail in identifying biologically meaningful genes, which should have synergistic effect on determine the clinical outcomes through some interactions rather than work individually. In this paper,...
Proteins function through interactions with other proteins, compounds, RNA and DNA. Prediction of protein interface sites is the key process for providing clues to the function of a protein, and is becoming increasing relevant to drug discovery. In this paper, combining the protein features with the theory of granular computing of quotient space based on protein-protein interaction sites classification...
A new machine learning approach has been developed in this study for sequence-based prediction of DNA-binding residues in proteins. The approach used both the labeled data instances collected from the available structures of protein-DNA complexes and the abundant unlabeled data found in protein sequence databases. The evolutionary information contained in the unlabeled sequence data was represented...
Mass spectrometry becomes the most widely used measurement in proteomics research. High dimensionality of features and small dataset are two major limitations restrict the accuracy of classification in mass spectrum data analysis. To improve the data mining result, two major issues need to be highlighted, which are feature extraction and feature selection. The quality of the feature set determines...
Using X-ray crystallography to determine the 3D structure of a protein is a costly and time-consuming process. One of the major reasons is that the protein needs to be purified and crystallized first, and the failure rate of protein crystallization is quite high. Thus it is desired to use a computational method to predict protein crystallizability based on the primary structure information before...
This paper presents a grid portal for protein secondary structure prediction developed by using services of Aneka, a .NET-based enterprise grid technology. The portal is used by research scientists to discover new prediction structures in a parallel manner. An SVM (support vector machine)-based prediction algorithm is used with 64 sample protein sequences as a case study to demonstrate the potential...
Protein phosphorylation is an important step in many biological processes, such as cell cycles, membrane transport, apoptosis, and so on. We design a new classifier ensemble approach called Bagging-Adaboost Ensemble (BAE) for the prediction of eukaryotic protein phosphorylation sites, which incorporates the bagging technique and the adaboost technique into the classifier framework to improve the accuracy,...
DNA-binding proteins play an important role in various intra- and extra-cellular activities. The key in the protein is DNA-binding region also called DNA-binding domain (DBD). However, it is hard to search the DBDs by means of homology search or hidden Markov models because of a wide variety of the sequences. In this work, we develop a kernel-based machine learning method by combination of multiple...
The support vector machine (SVM) method based on n-peptide composition (Yu et al, Proteins: Struct. Funct. Genet. 2003:50:531-536) is used to predict the subcellular localizations of proteins. For an unbiased assessment of the results, we apply our approach to two independent data sets: one set consisting of two parts (Reinhardt and Hubbard, Nucleic Acids Res. 1998; 26:2230-2236): the prokaryotic...
Gene expression levels are influenced significantly by the presence or absence of cis-regulatory elements or motifs. This paper presents classification systems in which the occurrences of both activator and repressor motifs constitute important inputs in predicting whether a gene will be up-regulated, down-regulated, or neither (neutral). We have experimented with several approaches for classification...
This paper uses the SVM to predict the protein disordered region. Nevertheless, the number of features used in this paper is 440. Both time and space complexity is high while performing the support vector machine (SVM) training and testing. So this paper proposes a hybrid feature selection mechanism to reduce the dimensionality of the feature space. The filter and wrapper feature selection methods...
Classification studies from microarray data have proved useful in tasks like predicting patient class. At the same time, more and more biological information about gene regulation networks has been gathered mainly in the form of graph. Incorporating the a priori biological information encoded by graphs turns out to be a very important issue to increase classification performance. We present a method...
An important application of microarrays is to identify the relevant genes, among thousands of genes, for phenotypic classification. The performance of a gene selection algorithm is often assessed in terms of both predictive capacity and computational efficiency, but predictive capacity of selected features receives more attention than does computational efficiency. However, in gene selection problems,...
The essentiality of a gene or protein is important for understanding the minimal requirements for cellular survival and development. Numerous computational methodologies have been proposed to detect essential proteins from large protein-protein interactions (PPI) datasets. However, only a handful of overlapping essential proteins exists between them. This suggests that the methods may be complementary...
Microarray datasets are often limited to a small number of samples with a large number of gene expressions. Therefore, dimensionality reduction through a feature/gene selection process is highly important for classification purposes. In this paper, a feature perturbation method we previously introduced is applied to do gene selection from microarray data. A publicly available colon cancer dataset...
Cancer is a group of complex diseases, in which a relatively large number of genes are involved. One of the main goals of cancer research is to identify genes that causally relevant to the development and progress of cancer. The increasingly identified cancer genes and availability of genomic and proteomics data provide us opportunities to identify cancer genes by computational methods. In this work,...
Dilated Cardiomyopathy is one of leading courses of heart failure. Recent advances in microarray technology have promised significant advantages in understanding the molecular mechanisms underlying dilated cardiomyopathy and heart failure. Several microarray studies have successfully yielded a set of signature genes associated with heart failure. However, it has been found that the overlap of these...
Kernel methods are used to tackle a variety of learning tasks including classification, regression, ranking, clustering, and dimensionality reduction. The appropriate choice of a kernel is often left to the user. But, poor selections may lead to a sub-optimal performance. Instead, sample points can be used to learn a kernel function appropriate for the task by selecting one out of a family of kernels...
In this paper we investigate application of the recently developed margin-based feature elimination (MFE) method for feature selection in support vector machines to high-dimensional, small sample size data from the DNA microarray domain. We compared the performance of MFE to the well-known recursive feature elimination (RFE) method. Our results show that MFE outperforms RFE in terms of generalization...
Information on secondary structures of amino acid residues in proteins provides valuable clues for the prediction of their 3-D structure and function. Although numerous computational techniques have been applied to predict protein secondary structure (PSS), only limited studies have dealt with discovery of logic rules underlying the prediction itself. Such rules offer interesting links between the...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.