The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
EGEPT (Middle East GenBank Post) is a database that monitors submissions to the GenBank nucleotide database from Middle East countries. The data in EGEPT is browsable by country, institute, author, organism, and related publications. Statistics about the dataset is provided and charts that compare the Middle East countries to each other are automatically generated. EGEPT revealed that Qatar, Egypt,...
Data management has become a critical challenge faced by a wide array of scientific disciplines in which the provision of sound data management is pivotal to the achievements and impact of research projects. Massive and rapidly expanding amounts of data combined with data models that evolve over time contribute to making data management an increasingly challenging task that warrants a rethinking of...
Relation extraction is a challenging task in biomedical text mining due to the complex of sentences in the biomedical literature. In this paper, we address multi-class relationship extraction problem from biomedical literature using Maximum Entropy model with simple word features. The proposed method is applied to extract the protein-protein interactions. Experiments show the method achieves an accuracy...
In the paper, put forward classification and discrimination based on rough sets-partial least squares-discriminant analysis (RS-PLS-DA). The method was proved to be feasible and effective after tested with a complication of diabetes database.
Many of the previous studies show convincing arguments that mining frequent subgraphs is especially useful. Many hidden frequent patterns which are very interesting can not be found by mining single graph. Previous studies as Quasi-Clique have little success with the hub problem. In this paper, we introduce a new conception Correlated-Quasi-Clique and develop a novel algorithm, CoClique, to address...
Computational approaches have been applied in many different biology application domains. When such tools are based on conventional computation, they have shown limitations to approach complex biological problems. In the present study, a computational evolutionary environment (CEE) is proposed as tool to extract classification rules from biological datasets. The main goal of the proposed approach...
Though the functional relationship analysis for gene products is useful, a convenient and user- friendly tool to measure the functional similarity for genome-wide gene products in multiple species is still not available. We computed the functional similarity of gene products in genome wide in human, mouse and rat based on our algorithm. Database and web services were built based on the precomputed...
PlaPID (Plant Protein Interaction Database) is a searchable database for protein-protein interactions in plants. It associates high-confidence information derived from published literatures and several databases. PlaPID aims to provide a reference and upgradeable database through comprehensive web services for studying plant protein interactions. The PlaPID database is freely accessible at http://www...
The following topics are dealt with: wireless communication; data communication; networking; information system; parallel processing; distributed processing; digital logic; signal processing; knowledge engineering; data engineering; pattern recognition; artificial intelligence; robotics; Web applications; neuro-fuzzy systems; Bangla language processing; software engineering; e-commerce; Web application;...
Dimensionality reduction applied to gene expression is challenging for machine learning algorithms due to a small number of samples and a high number of attributes. This paper proposes a preprocessing phase by means of random projection method in microarray data. Experimental results are promising and it shows that the use of this method improves the performance of classification algorithms.
We are interested in exploiting domain knowledge for the task of candidate gene prioritization. In this paper, we present a new gene prioritization method that learns a probabilistic knowledge model and exploits it to prioritize candidate genes. The knowledge model is represented by a network of associations among domain concepts (e.g., genes) and is extracted from a domain database (e.g., protein-protein...
Traditional data models explicitly or implicitly assume that data are organized according to a single, "correct" classification scheme. However, there is increasing recognition that biological and other phenomena can be classified in multiple ways to accommodate varying perspectives. In this context, we review an approach to instance-based data modeling that might be useful for managing...
Knowledge about protein's structure can help in understanding its function and has many applications in computer-aided drug design and protein engineering. In this paper we introduce a new methodology for predicting protein structural class using Emerging Subsequences (ES). In a sequence database, an emerging subsequence of data class is a subsequence which occurs more frequently in that class rather...
Optimal estimation of similarity distance between DNA sequences is performed through alignment process. This optimal alignment process is done by using dynamic programming method which running in quadratic O(ntimesm) time complexity. Filtering process is a common technique introduced to improve this optimal alignment process. A filtering process applied in heuristic tools such as BLAST and FASTA consists...
The goal of gene normalization (GN) is to identify the unique database identifiers of genes and proteins mentioned in biomedical literature. A major difficulty in GN comes from inter-species gene ambiguity. That is, the same gene name can refer to different database identifiers depending on the species in question. In this paper, we introduce a method to exploit contextual information in an abstract,...
Data management is one of the fundamental requirements of ubiquitous computing. Existing data management systems are complex and provide a multitude of functionalities. Due to complexity and their monolithic architecture, it is difficult to tune these data management systems for consistent performance. In this paper, we extend our existing work of Cellular DBMS with the concept of autonomy. We present...
As databases can overlap each other, data matching that aims to identify data records or elements describing the same object is one of the fundamental problems in physical integration of databases. Matching results can be applied to induce more accurate and complete object descriptions, remove data redundancy, check data consistency and generate cross-links. In this paper, we present a multilevel...
According to the mechanism clarification problem of traditional Chinese medicine (TCM), a systems biology platform was constructed using the related achievements as the basis, entity grammar systems as the framework and database construction, data mining, qualitative reasoning as the kernel techniques. This platform is composed of TCM active components database, TCM prescription database, drug targets...
In protein structure prediction, identifying the inter-residue contacts is a very important task to understand the mechanism of complicated protein folding and therefore to predict three-dimensional structures of proteins. So far, many methods were developed to predict inter-residue contacts. However, no special database consisting of detailed inter-residue contacts for each PDB protein chain has...
We designed a new genome search tool, basic sequence search by hashing algorithm (BSSHA) bases on basic local alignment search tool (BLAST) and, sequence search and alignment by hashing algorithm (SSAHA) for DNA sequence databases. Preprocess query sequence by making m-letter word list. Sequences in the database are preprocessed by breaking them into k-tuples of k-contiguous bases and hash table is...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.