The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The use of ontology presents a novel data integration resource, when centred in semantic definitions and the need for interoperability. Results from previews works indicate that ontologies can drive knowledge acquisition processes for the purpose of comprehensive, transportable machine understanding and knowledge management. Applied to the biodiversity domain, ontologies can be a valuable resource...
Unlike most conventional techniques with static model assumption, this paper aims to estimate the time-varying model parameters and identify significant genes involved at different timepoints from time course gene microarray data. We first formulate the parameter identification problem as a new maximum a posteriori probability estimation problem so that prior information can beincorporated as regularization...
Recent work proposes a Bayesian hierarchical model for feature selection in which priors are placed over the identity of each feature, as well as over the underlying feature-label distribution. Given data, Bayesian inference can be used to find a maximum posterior probability feature set. In this work, we examine the application of this theory to microarray data for biomarker discovery. A major challenge...
With the development of DNA microarray technology, scientists can now measure gene expression levels. However, such high-throughput microarray technologies produce a long list of genes with small sample size and high noisy genes. The data need to be further analysed and interpreting information on biological process requires a lot of practice and usually is a time consuming process. Most of the traditional...
This paper presents the budgeted transcript discovery problem (BTD): deciding how to spend a given research budget collecting data, using a combination of microarrays and PCRs, to discover which transcripts are differentially expressed with respect to a given phenotype. We present algorithms that address this task by sequentially analyzing the data collected so far, to decide which data would be most...
Coreference resolution recently plays a more and more important role for many natural language processing tasks. In this paper, we propose two methods for the biomedical coreference resolution. One is the single machine learning method (SVM ranker-learning algorithm) which selects appropriate features for the pronoun and noun phrase coreference resolution respectively. The other one is the hybrid...
Many scientific experiments are designed as computational workflows in bioinformatics. However, the amount of data generated increases at every phase of each execution, hindering the identification of the source and the transformation of data. Therefore, it has become necessary to create new tools to store data provenance, mainly which resources and parameters were used to generate the results, among...
Bioinformatics datasets have historically been difficult to work with. However, within machine learning, there is a potentially effective tool to combat such problems: ensemble learning. Ensemble learning generates a series of models and combines their results to make a single decision. This process has the benefit of utilizing the power of multiple models but the overhead of having to compute the...
In the domain of bioinformatics, two common problems encountered when analyzing real-world datasets are class imbalance and high dimensionality. Boosting is a technique that can be used to improve classification performance, even in the presence of class imbalance. In addition, data sampling and feature selection are two important preprocessing techniques used to counter the adverse effects of both...
Rapid development of genome sequencing technologies enables novel insights into the mechanisms of complex disease through Big Data analysis. Physicians can nowadays assay a patient's gene variants and gene expression patterns in a timely manner and use the obtained data to study an individual's susceptibility to complex disease and unravel the underlying mechanisms of disease pathogenesis. Massive...
Long non-coding RNAs (lncRNAs) have been implicated in various biological processes, and are linked in many dysregulations. Researchers have reported large number of lncRNA associated human diseases over the past decade. In this article we employed the Non-negative Matrix Factorization method to develop a low-dimensional computational model that can describe the existing knowledge about lncRNA-disease...
Increasingly complex biomedical data from diverse sources demands large storage, efficient software and high performance computing for the data's computationally intensive analysis. Cloud technology provides flexible storage and data processing capacity to aggregate and analyze complex data; facilitating knowledge sharing and integration from different disciplines in a collaborative research environment...
Modeling of gene regulatory networks play an important role in the post genomic era. In this work, we propose a Bayesian inference based model to quantitatively analyze the transcriptional regulatory network when the structure of regulatory network is given. In the proposed model, the dynamics of transcription factors are treated as a Markov process. Besides, the sequence features of genes are employed...
Identification of biomarkers from high dimensional data is one of the most important emerging topics in genomics and personalized medicine. Gene selection aims to find a parsimonious subset of features that has the most discriminative information for a specific disease. The variations in real clinical tests have a great impact on the diagnosis efficiency. This influence makes producing stable or robust...
In this paper, we proposed an exact method to discover all order-preserving submatrices (OPSMs) based on frequent sequential pattern mining. Firstly, an existing algorithm calACS is adjusted to disclose all common subsequences between every two row sequences, therefore all the deep OPSMs corresponding to long patterns with few supporting sequences will not be missed. Then an improved data structure...
Time-course gene expression profiling provides valuable data on dynamic behavior of cellular responses to external stimulation. Investigation of transcription factors (TFs) that regulate co-expressed genes in a dynamic process can reveal insights on the underlying molecular mechanisms. As the ChIP-seq technology is only suitable for a fraction of TFs in mammalian organisms, the computational identification...
The process of whole genome doubling (WGD) gives rise to two copies of each chromosome in a genome, containing the same genes in the same order. Through an attrition mechanism known as fractionation, one of each pair of duplicate genes is lost over evolutionary time, resulting in an interleaving patterns of deletions from duplicated regions [1]. This differentiates the WGD/fractionation model from...
We propose a comprehensive information processing, knowledge discovery and simulation platform for Big Data Healthcare. In addition, we present a related, well-defined workflow that promotes model-guided personalized medicine. We start by identifying disease signatures and homogeneous patient groups, whilst modeling case-based patient similarity. Then we analyze correlations between variables and...
Bioinformatics tools require large-scale processing mainly due to very large databases achieving gigabytes of size. In federated cloud environments, although services and resources may be shared, storage is particularly difficult, due to distinct computational capabilities and data management policies of several separated clouds. In this work, we propose a storage policy for BioNimbuZ, a hybrid federated...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.