Search results

Items from 1 to 20 out of 56 results

chapter

A comparison study of statistical methods for the analysis metagenome data

Chanyoung Lee, Seungyeoun Lee, Taesung Park

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 1777 - 1781

2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

As many diseases are known to be related to microbes, interests in statistical methods for Microbiome-Wide Association Studies (MWAS) are also increasing. In this respect, we systematically investigate the properties of statistical methods for MWAS and compare their performances using simulation data generated from Human Microbiome Project data. We first assessed the type I error rates of eight commonly...

chapter

An Error Correction Algorithm for NGS Data

Mehdi Kchouk, Jean-Francois Gibrat, Mourad Elloumi

2017 28th International Workshop on Database and Expert Systems Applications (DEXA) > 84 - 87

2017 28th International Workshop on Database and Expert Systems Applications (DEXA)

The Oxford Nanopore and Pacbio SMRT sequencing technologies has revolutionized the Next-Generation Sequencing (NGS) environment by producing long reads that exceed 60 kbp and helped to the completion of many biological projects. But, long reads are characterized by a high error rate which increases the difficulty of biological problems like the genome assembly problem. Error correction of long reads...

chapter

Conversion of Mate-Pair Reads into Long Sequences for Improving Assembly Scaffolding

Chao-Hung Lee, Cheng-Wei Tsai, Yao-Ting Huang

2016 International Computer Symposium (ICS) > 44 - 49

2016 International Computer Symposium (ICS)

Mate-pair sequencing is a technology for sequencing two ends of long DNA fragments, which has been widely used in genome scaffolding. Although the cost of mate-pair sequencing is now affordable, its accuracy has been limited by the lower quality and contamination. The 3rd generation sequencing is able to generate long reads for genome scaffolding. However, the error rates and cost are still too high...

chapter

SeqMaker: A next generation sequencing simulator with variations, sequencing errors and amplification bias integrated

Shifu Chen, Yue Han, Lanting Guo, Jingjing Hu, more

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 835 - 840

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

Tuning bioinformatics pipelines and training software parameters require sequencing data with known ground truth, which are actually difficult to get from real sequencing data. Particularly, for those applications of detecting low frequency variations (like ctDNA sequencing), it is hard to tell whether a called variation is a true positive, or a false positive caused by errors from sequencing or other...

chapter

Hybrid error correction approach and de novo assembly for minion sequencing long reads

Mehdi Kchouk, Mourad Elloumi

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 122 - 125

2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

A track to solve the problem of errors caused by the third generation of sequencing technology is to use the high coverage of the high quality of short reads generated by the second-generation sequencing technology. This paper presents a new approach for error correction and de novo assembly for long reads. We present MiRCA a hybrid approach based on the sequences alignments that detects and corrects...

chapter

Cancer classification ensemble system based on gene expression profiles

Sara Tarek, Reda Abd Elwahab, Mahmoud Shoman

2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA) > 1 - 4

2016 5th International Conference on Electronic Devices, Systems and Applications (ICEDSA)

Cancer classification based on molecular level investigation has gained the interest of researches as it provides a systematic, accurate and objective diagnosis for different cancer types. It has also been applied in a wide range of applications such as drug discovery, cancer prediction and diagnosis which is a very important issue for cancer treatment. Besides, it helps in understanding the function...

chapter

A* fast and scalable high-throughput sequencing data error correction via oligomers

Franco Milicchio, Iain E. Buchan, Mattia C.F. Prosperi

2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) > 1 - 9

2016 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

Next-generation sequencing (NGS) technologies have superseded traditional Sanger sequencing approach in many experimental settings, given their tremendous yield and affordable cost. Nowadays it is possible to sequence any microbial organism or meta-genomic sample within hours, and to obtain a whole human genome in weeks. Nonetheless, NGS technologies are error-prone. Correcting errors is a challenge...

chapter

Do read errors matter for genome assembly?

Ilan Shomorony, Thomas Courtade, David Tse

2015 IEEE International Symposium on Information Theory (ISIT) > 919 - 923

2015 IEEE International Symposium on Information Theory (ISIT)

While most current high-throughput DNA sequencing technologies generate short reads with low error rates, emerging sequencing technologies generate long reads with high error rates. A basic question of interest is the tradeoff between read length and error rate in terms of the information needed for the perfect assembly of the genome. Using an adversarial erasure error model, we make progress on this...

chapter

A feasible roadmap to identifying significant intercellular genomic heterogeneity in deep sequencing data

Guoqiang Yu, Niya Wang, Roger R. Wang, Sean S. Wang, more

2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 1364 - 1367

2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Intercellular heterogeneity serves as both a confounding factor in studying individual clones and an information source in characterizing any heterogeneous tissues, such as blood, tumor systems. Due to inevitable sequencing errors and other technical artifacts such as PCR errors, systematic efforts to characterize intercellular genomic heterogeneity must effectively distinguish genuine clonal sequences...

chapter

A Naive-Bayes approach to Bolstered error estimation in high-dimensional spaces

Xingde Jiang, Ulisses Braga-Neto

2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP) > 1398 - 1401

2014 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Bolstered error estimation has been shown to perform better than cross-validation and competitively with bootstrap in small-sample settings. However, its performance can deteriorate in the high-dimensional settings prevalent in Genomic Signal Processing. We propose here a modification of Bolstered error estimation that is based on the principle of Naive Bayes. Rather than attempting to estimate a...

chapter

Random Forest and Gene Ontology for functional analysis of microarray data

Tham Wen Shi, Kohbalan Moorthy, Mohd Saberi Mohamad, Safaai Deris, more

2014 IEEE 7th International Workshop on Computational Intelligence and Applications (IWCIA) > 29 - 34

2014 IEEE 7th International Workshop on Computational Intelligence and Applications (IWCIA)

With the development of DNA microarray technology, scientists can now measure gene expression levels. However, such high-throughput microarray technologies produce a long list of genes with small sample size and high noisy genes. The data need to be further analysed and interpreting information on biological process requires a lot of practice and usually is a time consuming process. Most of the traditional...

chapter

Quasispecies reconstruction based on vertex coloring algorithms

Diyue Bu, Haixu Tang

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM) > 63 - 66

2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

The viral quasispecies represent a set of related variants in a virus population (e.g. from an infected patient) that contain similar mutations due to the rapid and mutation-prone replications in viruses. The characterization of viral quasispecies in a highly divergent virus population is of great interest in biomedical research, in particular, to identify virulent and drug-resistant mutations in...

chapter

HErCoOl: High-Throughput Error Correction by Oligomers

Franco Milicchio, Mattia C.F. Prosperi

2014 IEEE 27th International Symposium on Computer-Based Medical Systems > 227 - 232

2014 IEEE 27th International Symposium on Computer-Based Medical Systems (CBMS)

Next-generation sequencing (NGS) technologies are marking the foundations for a new paradigm in genomics and transcriptomics. Nowadays is possible to sequence any microbial organism or meta-genomic sample within hours, and to obtain a whole human genome in less than a month. The sequencing prices are decreasing dramatically, opening to actual personalised medicine. NGS technologies however are error-prone,...

chapter

IPEDX: An exact algorithm for pedigree reconstruction using genotype data

Dan He, Eleazar Eskin

2013 IEEE International Conference on Bioinformatics and Biomedicine > 517 - 520

2013 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

The problem of inference of family trees, or pedigree reconstruction, for a group of individuals has attracted lots of attentions recently. Various methods have been proposed to automate the process of pedigree reconstruction given the genotypes or haplotypes of a set of individuals. The state-of-the-art method IPED is able to reconstruct large pedigrees with reasonable accuracy. However, the algorithm...

chapter

Optimal Bayesian feature selection

Lori A. Dalton

2013 IEEE Global Conference on Signal and Information Processing > 65 - 68

2013 IEEE Global Conference on Signal and Information Processing (GlobalSIP)

Biomarker discovery and classification in medical applications both typically involve feature selection applied to a small-sample high-dimensional dataset. Recent work has proposed a framework to integrate a prior over an uncertainty class of parameterized feature-label distributions with training data to obtain optimal classifiers, MMSE classifier error estimates, and evaluate the MSE of error estimates...

chapter

PUPPI: A pathway analysis method using protein-protein interaction network for case-control data

Ren-Hua Chung

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB) > 238 - 241

2013 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB)

The development of statistical pathway analysis methods has focused on testing individual main effects of genes in a pathway on disease. However, gene-gene interactions can also play an important role in complex disease etiology. We developed a pathway analysis method based on a protein-protein interaction network to account for gene-gene interactions in a pathway. We used simulations to evaluate...

chapter

Base calling error rates in next-generation DNA sequencing

Manohar Shamaiah, Haris Vikalo

2012 IEEE Statistical Signal Processing Workshop (SSP) > 692 - 695

2012 IEEE Statistical Signal Processing Workshop (SSP)

We study the problem of base calling in next-generation DNA sequencing platforms that rely on reversible terminator chemistry. After reviewing a statistical model of the generated signal and the Viterbi algorithm for finding the maximum-likelihood solution to the base calling problem, we present a closed form expression for the upper bound on the probability of base calling error. Simulation results...

chapter

Simpute: A Simple Genotype Imputation Method

Yen Jen Lin, Chun Tien Chang, Chuan Yi Tang, Wen-Ping Hsieh

2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems > 573 - 579

2012 Sixth International Conference on Complex, Intelligent, and Software Intensive Systems (CISIS)

High-throughput technology for genotyping has made genome-wide associations possible. Single nucleotide polymorphism (SNP) data derived from array-based technology are usually flawed due to missing data, although they have generally high call rates and good concordance rates across different genotype calling schemes. Missing SNPs can bias the results of association analyses and hence loci with missing...

chapter

Prediction of the optimum combination of solexa sequencing libraries in genome projects

You-Jie Zhao, Jun-ying Jiao, Kun-Rong Hu, Yong Cao, more

2012 International Conference on Systems and Informatics (ICSAI2012) > 2264 - 2267

2012 International Conference on Systems and Informatics (ICSAI)

DNA sequencing technology has played an important role on life sciences, especially Illumina's solexa sequencer. It was used for more and more genome projects. Solexa libraries were usually constructed with insert sizes of 200bp, 500bp, 2k, 5k and 10k in genome projects. It is a problem how to find the optimum combination of different insert sizes and different depth of solexa sequencing libraries...

chapter

Relationship between the accuracy of classifier error estimation and distribution complexity

Esmaeil Atashpaz-Gargari, Chao Sima, Ulisses M. Braga-Neto, Edward R. Dougherty

2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS) > 147 - 149

2011 IEEE International Workshop on Genomic Signal Processing and Statistics (GENSIPS)

Error estimation is a crucial part of any classification problem and it becomes problematic with small samples. In this paper, we analyze the performance of some widely used error estimation methods relative to the complexity of the feature-label distribution: resubstitution, 10-fold cross validation with repetition (CV10r), leave-one-out (LOO), bootstrap .632, and bolstered resubstitution. Our definition...

Keywords:
ERROR ANALYSIS
Publication type:
book

Publication date

Set your own date range

Keywords

GENOMICS (31)
DNA (14)
GENETICS (14)
PATTERN CLASSIFICATION (11)
TRAINING (10)
SEQUENTIAL ANALYSIS (9)
COMPUTATIONAL MODELING (7)
DISEASES (7)
FEATURE SELECTION (7)
SUPPORT VECTOR MACHINES (7)
DATA MODELS (6)
GENE EXPRESSION (6)
LEARNING (ARTIFICIAL INTELLIGENCE) (6)
MOLECULAR BIOPHYSICS (6)
CLASSIFICATION (5)
ERROR CORRECTION (5)
ACCURACY (4)
APPROXIMATION METHODS (4)
BIOLOGICAL CELLS (4)
BIOLOGICAL SYSTEM MODELING (4)
BIOLOGICAL TECHNIQUES (4)
BIOLOGY COMPUTING (4)
CANCER (4)
COMPLEXITY THEORY (4)
DATA ANALYSIS (4)
DATA MINING (4)
ESTIMATION (4)
FEATURE EXTRACTION (4)
GENE SELECTION (4)
KERNEL (4)
MACHINE LEARNING (4)
STATISTICAL ANALYSIS (4)
ASSEMBLY (3)
BAYES METHODS (3)
CELLULAR BIOPHYSICS (3)
CLASSIFICATION ALGORITHMS (3)
CORRELATION (3)
DATABASES (3)
ERROR STATISTICS (3)
HEURISTIC ALGORITHMS (3)
JOINTS (3)
MATHEMATICAL MODEL (3)
NEXT GENERATION NETWORKING (3)
NEXT GENERATION SEQUENCING (3)
PROTEINS (3)
BAYESIAN METHODS (2)
BENCHMARK TESTING (2)
BIOINFORMATICS LITERATURE (2)
CLASSIFICATION ERROR (2)
CONFERENCES (2)
COVARIANCE MATRIX (2)
DATA CLASSIFICATION (2)
DECODING (2)
DISTANCE MEASUREMENT (2)
ENTROPY (2)
EQUATIONS (2)
ERROR ESTIMATION (2)
ERROR PROBABILITY (2)
FILTERING (2)
GAUSSIAN DISTRIBUTION (2)
GENETIC ALGORITHMS (2)
GRAPH THEORY (2)
HAPLOTYPE RECONSTRUCTION (2)
HIDDEN MARKOV MODELS (2)
HUMANS (2)
ILLUMINA (2)
INFORMATION THEORY (2)
INFORMATIVE GENES (2)
LIBRARIES (2)
LONG READS (2)
MEDICAL DIAGNOSTIC COMPUTING (2)
MEDICAL IMAGE PROCESSING (2)
MICROARRAY (2)
MICROARRAY DATA (2)
MICROORGANISMS (2)
ORGANISMS (2)
PROBABILITY (2)
PROTEOMICS (2)
RANDOM FOREST (2)
REDUNDANCY (2)
SAMPLING METHODS (2)
SINGLE NUCLEOTIDE POLYMORPHISM (2)
SUPPORT VECTOR MACHINE (2)
TIN (2)
UNCERTAINTY (2)
ACCELERATED GRADIENT METHOD (1)
ACCELERATION (1)
ACTIVE MICROSCOPIC CELLULAR IMAGE ANNOTATION (1)
ACTIVE PATHWAY DISCOVERY (1)
ADAPTATION MODELS (1)
ALGORITHM (1)
ALGORITHM DESIGN AND ANALYSIS (1)
ALIGNMENT (1)
ALTERNATIVE SPLICING (1)
AMINO ACID INTERACTION NETWORKS (1)
AMINO ACIDS (1)
ANALYTICAL MODELS (1)
ANT COLONY (1)
more

INFONA - science communication portal

Search results

A comparison study of statistical methods for the analysis metagenome data

An Error Correction Algorithm for NGS Data

Conversion of Mate-Pair Reads into Long Sequences for Improving Assembly Scaffolding

SeqMaker: A next generation sequencing simulator with variations, sequencing errors and amplification bias integrated

Hybrid error correction approach and de novo assembly for minion sequencing long reads

Cancer classification ensemble system based on gene expression profiles

A* fast and scalable high-throughput sequencing data error correction via oligomers

Do read errors matter for genome assembly?

A feasible roadmap to identifying significant intercellular genomic heterogeneity in deep sequencing data

A Naive-Bayes approach to Bolstered error estimation in high-dimensional spaces

Random Forest and Gene Ontology for functional analysis of microarray data

Quasispecies reconstruction based on vertex coloring algorithms

HErCoOl: High-Throughput Error Correction by Oligomers

IPEDX: An exact algorithm for pedigree reconstruction using genotype data

Optimal Bayesian feature selection

PUPPI: A pathway analysis method using protein-protein interaction network for case-control data

Base calling error rates in next-generation DNA sequencing

Simpute: A Simple Genotype Imputation Method

Prediction of the optimum combination of solexa sequencing libraries in genome projects

Relationship between the accuracy of classifier error estimation and distribution complexity

Filter options

Publication date

Keywords

INFONA - science communication portal

Search results

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options