The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we present a solution to a special classification problem that we have encountered during the analysis of tandem mass spectrometry data of proteins. First, we present a thorough statistical analysis of our data set. From this analysis, we build a model for the data that allows us to formulate our mass spectrometry data analysis problem as a special kind of classification problem. We...
Snake neurotoxins are important experimental tool in pharmacological research. Over the years, the number of snake neurotoxin sequences identified is increasing at a very fast pace. However, only a small portion of them are experimentally characterized from more than 200,000 variants estimated to exist in nature. In this paper, we report a systematic functional analysis on snake neurotoxins using...
A typical microarray data of ovarian cancer consists of the expressions of tens of thousands of genes on a genomic scale. To avoid higher computational complexity, we want to find the most likely differentially expressed gene that best explain the effects of tumor/cancer for ovarian cancer. In this paper, we derive a hybrid approach for extracting and evaluating informative genes from microarray data...
When analyzing biological data sets, a common approach is to partition the data into clusters. Examples of this include finding a subset of genes with co-regulated expression among experiments, grouping similar disease phenotypes, or implicating regions of genetic variation in disease. The ability to separate the data into subsets depends upon the structure of the distribution of points and the choice...
One of the important goals in the post-genomic era is to identify the functions of genes, either individually or as group. Recently, there has been an increasing use of the gene ontology (GO) to analyze a list of genes identified via various statistical and/or computational methods. The main assumption behind using GO for interpreting microarray data is that the genes that belong to similar molecular...
As we know, the genes could cause the cell phenotypes to change dramatically. Currently, biologists attempt to perform the genome-wide RNAi screening to identify various image phenotypes. It is a challenging task to recognize the phenotypes automatically because of the noisy background and low contrast of fluorescence images. In this work, we applied two cellular segmentation techniques, deformable...
The analysis of alignments of functionally equivalent proteins can reveal regularities such as correlated positions or residue patterns which are important to ensure a specific fold and various cellular functions. Many approaches are found in the literature which try to identify correlated positions to predict the residues that are close to each other in the three-dimensional folded structure. However,...
Genome-wide association study for complex diseases will generate massive amount of single nucleotide polymorphisms (SNPs) data. Univariate statistical test (i.e. Fisher exact test) was used to single out non-associated SNPs. However, the disease-susceptible SNPs may have little marginal effects in population and are unlikely to retain after the univariate tests. Also, model-based methods are impractical...
We consider computationally reconstructing gene regulatory networks on top of the binary abstraction of gene expression state information. Unlike previous Boolean network approaches, the proposed method does not handle noisy gene expression values directly. Instead, two-valued "hidden state" information is derived from gene expression profiles using a robust statistical technique, and a...
Over the last several years there has been an explosion of microarray technology in the biosciences, medical sciences, biotechnology, and pharmaceutical industry. The technology has centered on providing a platform for determining the gene expression profiles of hundreds to tens of thousands of genes (or transcript levels of RNA species) in tissue, tumors, cells, or biological fluids in a single experiment...
Identification of transmembrane segments in protein sequences is an important issue in the field of bioinformatics. In this study, a method is proposed for linear discrimination between transmembrane and non-transmembrane segments, combining chemical and statistical features of the proteins with higher-order crossings analysis for protein segment classification. The method was tested on human proteins...
Microscopists are familiar with many blemishes that fluorescence images can have due to dust and debris, glass flaws, uneven distribution of fluids or surface coatings, etc. Microarray scans do show similar artifacts, which might affect subsequent analysis. We developed a tool, Harshlight, for the detection and masking of blemishes in HDONA microarray chips. Harshlight uses a combination of statistic...
E. coli promoter recognition is an area of great interest in bioinformatics. In this paper, we describe the implementation of a feed forward neural network to predict the E. coli promoter. According to the sequence conservation, some sequences with 60 bases are selected as positive samples and some corresponding non-promoters from E. coli coding areas are selected as negative samples, and a classifier...
For the critical task of gene module discovery in genomic research, we present a model-based hierarchical data clustering and visualization algorithm, visual statistical data analyzer (VISDA), which effectively exploits human-data interaction to improve the clustering outcome. Guided by a diagnostic tree, we apply VISDA to a muscular dystrophy dataset that contains a number of different phenotypic...
When the same set of genes appear in two top ranking gene lists in two different studies, it is often of interest to estimate the probability for this being a chance event. This overlapping probability is well known to follow the hypergeometric distribution. Usually, the lengths of top-ranking gene lists are assumed to be fixed, by using a pre-set criterion on, e.g., p-value for the t-test. We investigate...
A latent-threshold model and misclassification algorithm were implemented to examine potential misdiagnosis among 16 Alzheimer's disease (AD) subjects using gene expression data. Results obtained without invoking the misclassification algorithm showed limited predictive power of the model. When the misclassification algorithm was invoked, four subjects were identified as being potentially misdiagnosed...
We consider computationally reconstructing gene regulatory networks on top of the binary abstraction of gene expression state information. Unlike previous Boolean network approaches, the proposed method does not handle noisy gene expression values directly. Instead, two-valued "hidden state" information is derived from gene expression profiles using a robust statistical technique, and a...
Identification of transmembrane segments in protein sequences is an important issue in the field of bioinformatics. In this study, a method is proposed for linear discrimination between transmembrane and non-transmembrane segments, combining chemical and statistical features of the proteins with higher-order crossings analysis for protein segment classification. The method was tested on human proteins...
Genome-wide association study for complex diseases will generate massive amount of single nucleotide polymorphisms (SNPs) data. Univariate statistical test (i.e. Fisher exact test) was used to single out non-associated SNPs. However, the disease-susceptible SNPs may have little marginal effects in population and are unlikely to retain after the univariate tests. Also, model-based methods are impractical...
Aim of the study was to test the reproducibility of estimates of static, Phis, and dynamic, Phid, beta-cell sensitivity to glucose, and predictions of the insulin secretion rate, SR(t), provided by the C-peptide oral minimal model (COMM) applied to oral glucose tolerance tests (OGTT) of various complexity. The study involved six volunteer, normotensive and normoglycemic subjects who underwent a 300-minute...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.