The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
High-order Drug-Drug Interactions (DDI) are common particularly for elderly people. It is highly non-trivial to detect such interactions via in vivo/in vitro experiments. In this paper, we present SVM-based classification methods to predict whether a high-order directional drug-drug interaction (HoDDDI) instance is associated with adverse drug reactions (ADRs) and induced side effects. Specifically,...
Protein-protein interaction (PPI) networks are valuable biological data source which contain rich information useful for protein function prediction. The PPI network data set obtained from high-throughput experiments is known to be noisy and incomplete. By modeling PPI data as a graph, research efforts are being made in the literature to improve the performance of protein function prediction by extending...
As an important branch of biomedical information extraction, Protein-Protein Interaction extraction (PPIe) from biomedical literatures has been widely researched, and machine learning methods have achieved great success for this task. However, the word feature generally adopted in the existing methods suffers badly from vocabulary gap and data sparseness, weakening the classification performance....
Previous research in information extraction from biological texts has focused intensively on the recognition of named entities, such as gene, protein or disease names and on the extraction of simple relations of these entities, such as proteinprotein interactions. Recently, the focus of research has been moving to higher levels of information extraction such as co-reference resolution and event extraction...
Due to the rapid growth in biological technology, the development of high-quality information extraction systems is needed and still remains a challenge. Several recently proposed approaches to biological relation extraction are based on machine learning techniques on lexical and syntactic information. Most use the dependency path between two genes/proteins instead of the whole dependency tree of...
Knowledge about protein-protein interactions unveils the molecular mechanisms of biological processes. This paper presents a multiple kernels learning-based approach to automatically extracting protein-protein interactions from biomedical literature. Experimental evaluations show that our approach can achieve state-of-the-art performance with respect to comparable evaluations, with 64.88% F-score...
In this paper, a new algorithm related with feature selection method mostly used in data mining, machine learning and pattern recognition areas is proposed. Classical Fukunaga-Koontz Transform is extended to a binary kernel classifier. We used cDNA microarrays to assess 11.000 gene expression profiles in 60 human cancer cell lines used in a drug discovery screen by the National Cancer Institute and...
Relationship extraction (RE) from biomedical literature is an important and challenging problem in both text mining and bioinformatics. Although various approaches have been proposed to extract protein-protein interaction types, their accuracy rates leave a large room for further exploration of more effective methods. In this paper, two supervised learning algorithms based on newly-defined ldquobio-semantic...
We proceed from a method for protein structure comparison in which information about the geometry and physico-chemical properties of such structures are represented in the form of labeled point clouds, that is, a set of labeled points in three-dimensional Euclidean space. Two point clouds are then compared by computing an optimal spatial superposition. This approach has recently been introduced in...
In this paper, we define a new research problem for mining approximate repeating patterns (ARP) with gap constraints, where the appearance of a pattern is subject to an approximate matching, which is very common in biological sciences. To solve the problem, we propose an ArpGap (Approximate repeating pattern mining with Gap constraints) algorithm with three major components for approximate repeating...
B-factor reflects the atom's uncertainty about its average position within a crystal structure and is highly correlated with protein functions. In this article, we propose a novel approach to predict the real value of B-factor. We firstly extract features from the protein sequences and their evolution information, then apply random forest tree to select the important features, which are further inputted...
We present the application of a recently proposed semi-supervised learning strategy - feature coupling generalization (FCG) - in the task of protein-protein interaction extraction from biomedical literatures. FCG is a framework that generates new features from relatedness of two special types of old features: example-distinguishing features (EDFs) and class-distinguishing features (CDFs). Their relatedness...
In hyperspectral image analysis the objective is to unmix a set of acquired pixels into pure spectral signatures (endmembers) and corresponding fractional abundances. The non-negative matrix factorization (NMF) methods have received a lot of attention for this unmixing process. Many of these NMF based unmixing algorithms are based on sparsity regularization encouraging pure spectral endmembers, but...
O-glycosylation of the mammalian protein is studied. It is serine or threonine specific, though any consensus sequence is still unknown. We have been applied support vector machines (SVM) for the prediction of O-glycosylation sites from various kinds of protein information, aiming to investigate a glycosylation condition and elucidate the mechanisms. In the present study, we focus on the distribution...
Remote homology detection and fold recognition are the central problems in protein classification. In real applications, kernel algorithms that are both accurate and efficient are required for classification of large databases. We explore a class of partial profile alignment kernels to be used with support vector machines (SVMs) for remote homology detection and fold recognition. While existing profile-based...
In multi-instance learning, each example is represented by a bag of instances while associated with a binary label. Under standard multi-instance learning settings, one example is labeled as a positive bag if at least one of its instances is positive. Otherwise, it is labeled as a negative bag. Although based on the above assumption, standard multi-instance learning has achieved much success in solving...
We present a comparative evaluation of a large number of anomaly detection techniques on a variety of publicly available as well as artificially generated data sets. Many of these are existing techniques while some are slight variants and/or adaptations of traditional anomaly detection techniques to sequence data.
Supervised approaches to data mining are particularly appealing as they allow for the extraction of complex relations from data objects. In order to facilitate their application in different areas, ranging from protein to protein interaction in bioinformatics to text mining in computational linguistics research, a modular and general mining framework is needed. The major constraint to the generalization...
We used human protein-protein interaction (PPI) data transformed into documents to perform text-mining via concept clusters. The advantage of text-mining PPI data is that words (proteins) that are very sparse or over-abundant can be dropped, leaving the remaining bulk of data for clustering and rule mining. Libraries of tissue-specific binary PPIs were constructed from a list of 36,137 binary PPIs...
Nowadays, protein-protein interaction (PPI) extraction has become a research focus. Many methods have been applied to this domain, such as supervised learning approaches. This paper applied support vector machine (SVM) to extract PPI, which bases on several lexical features and one syntactic feature achieved through link grammar parser. Due to syntax's complexity different sentence structure can not...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.