Algorithms in Bioinformatics

chapter

Sequence Database Compression for Peptide Identification from Tandem Mass Spectra

Nathan Edwards, Ross Lippert

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 230-241

The identification of peptides from tandem mass spectra is an important part of many high-throughput proteomics pipelines. In the high-throughput setting, the spectra are typically identified using software that matches tandem mass spectra with putative peptides from amino-acid sequence databases. The effectiveness of these search engines depends heavily on the completeness of the amino-acid sequence...

chapter

Linear Reduction for Haplotype Inference

Jingwu He, Alex Zelikovsky

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 242-253

Haplotype inference problem asks for a set of haplotypes explaining a given set of genotypes. Popular software tools for haplotype inference (e.g., PHASE, HAPLOTYPER) as well as new algorithms recently proposed for perfect phylogeny inference (DPPH) are often not well scalable. When the number of sites (SNP’s) comes to thousands these tools often cannot deliver answer in reasonable time even if the...

chapter

A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis

Daniel G. Brown, Ian M. Harrower

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 254-265

We present a new integer programming formulation for the haplotype inference by pure parsimony (HIPP) problem. Unlike a previous approach to this problem [2], we create an integer program whose size is polynomial in the size of the input. This IP is substantially smaller for moderate-sized instances of the HIPP problem. We also show several additional constraints, based on the input, that can be added...

chapter

Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction

Alessandro Panconesi, Mauro Sozio

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 266-277

We study the single individual SNP haplotype reconstruction problem. We introduce a simple heuristic and prove experimentally that is very fast and accurate. In particular, when compared with a dynamic programming of [8] it is much faster and also more accurate. We expect Fast Hare to be very useful in practical applications. We also introduce a combinatorial problem related to the SNP haplotype reconstruction...

chapter

Approximation Algorithms for the Selection of Robust Tag SNPs

Yao-Ting Huang, Kui Zhang, Ting Chen, Kun-Mao Chao

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 278-289

Recent studies have shown that the chromosomal recombination only takes places at some narrow hotspots. Within the chromosomal region between these hotspots (called haplotype block), little or even no recombination occurs, and a small subset of SNPs (called tag SNPs) is sufficient to capture the haplotype pattern of the block. In reality, the tag SNPs may be genotyped as missing data, and we may fail...

chapter

The Minisatellite Transformation Problem Revisited: A Run Length Encoded Approach

Behshad Behzadi, Jean-Marc Steyaert

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 290-301

In this paper we present a more efficient algorithm for comparison of minisatellites which has complexity O(n’³+ m’³ + mn’²+ nm’² +mn) where n and m are the lengths of the maps and n’ and m’ are the sizes of run-length encoded maps. We show that this algorithm makes a significant improvement for the real biological data, dividing the computing time by a factor 30 on a significant set of data.

chapter

A Faster and More Space-Efficient Algorithm for Inferring Arc-Annotations of RNA Sequences Through Alignment

Jesper Jansson, See-Kiong Ng, Wing-Kin Sung, Hugo Willy

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 302-313

This paper considers the problem of inferring the optimal nested arc-annotation of a sequence given another nested arc-annotated sequence by maximizing the weighted alignment between the bases and arcs in the two sequences. The problem has a direct application in predicting the secondary structure of an RNA sequence given a closely related sequence whose secondary structure is already known. The currently...

chapter

New Algorithms for Multiple DNA Sequence Alignment

Daniel G. Brown, Alexander K. Hudek

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 314-325

We present a mathematical framework for anchoring inglobal multiple alignment. Our framework uses anchors that are hits to spaced seeds and identifies anchors progressively, using a phylogenetic tree. We compute anchors in the tree starting at the root and going to the leaves, and from the leaves going up. In both cases, we compute thresholds for anchors to minimize errors. One innovative aspect of...

chapter

Chaining Algorithms for Alignment of Draft Sequence

Mukund Sundararajan, Michael Brudno, Kerrin Small, Arend Sidow, more

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 326-337

In this paper we propose a chaining method that can align a draft genomic sequence against a finished genome. We introduce the use of an overlap tree to enhance the state information available to the chaining procedure in the context of sparse dynamic programming, and demonstrate that the resulting procedure more accurately penalizes the various biological rearrangements. The algorithm is tested on...

chapter

Translation Initiation Sites Prediction with Mixture Gaussian Models

Guoliang Li, Tze-Yun Leong, Louxin Zhang

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 338-349

Translation initiation sites (TIS) are important signals in cDNA sequences. Many research efforts have tried to predict TIS in cDNA sequences. In this paper, we propose using mixture Gaussian models to predict TIS in cDNA sequences. Some new global measures are used to generate numerical features from cDNA sequences, such as the length of the open reading frame downstream from ATG, the number of other...

chapter

Online Consensus and Agreement of Phylogenetic Trees

Tanya Y. Berger-Wolf

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 350-361

Computational heuristics are the primary methods for reconstruction of phylogenetic trees on large datasets. Most large-scale phylogenetic analyses produce numerous trees that are equivalent for some optimization criteria. Even using the best heuristics, it takes significant amount of time to obtain optimal trees in simulation experiments. When biological data are used, the score of the optimal tree...

chapter

Relation of Residues in the Variable Region of 16S rDNA Sequences and Their Relevance to Genus-Specificity

Maciej Liśkiewicz, Hemant J. Purohit, Dhananjay V. Raje

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 362-373

It has been observed that the short nucleotide sequences in a variable region, representing species level diversity in a set of 16S rDNA sequences carries the genus specific signature. In this study our aim is to assess the relationship of residues at different positions and thereby obtain consensus patterns using different statistical tools. If such patterns are found genus-specific then it would...

chapter

Topological Rearrangements and Local Search Method for Tandem Duplication Trees

Denis Bertrand, Olivier Gascuel

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 374-387

The problem of reconstructing the duplication history of a set of tandemly repeated sequences was first introduced by Fitch (1977). Many recent works deal with this problem, showing the validity of the unequal recombination model proposed by Fitch, describing numerous inference algorithms, and exploring the combinatorial properties of these new mathematical objects, which are duplication trees (DT)...

chapter

Phylogenetic Super-networks from Partial Trees

Daniel H. Huson, Tobias Dezulian, Tobias Klöpper, Mike A. Steel

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 388-399

In practice, one is often faced with incomplete phylogenetic data, such as a collection of partial trees or partial splits. This paper poses the problem of inferring a phylogenetic super-network from such data and provides an efficient algorithm for doing so, called the Z-closure method. Application to a set of five published partial gene trees relating different fungal species illustrates the usefulness...

chapter

Genome Identification and Classification by Short Oligo Arrays

Stanislav Angelov, Boulos Harb, Sampath Kannan, Sanjeev Khanna, more

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 400-411

We explore the problem of designing oligonucleotides that help locate organisms along a known phylogenetic tree. We develop a suffix-tree based algorithm to find such short sequences efficiently. Our algorithm requires O(Nm) time and O(N) space in the worst case where m is the number of the genomes classified by the phylogeny and N is their total length. We implemented our algorithm and used it to...

chapter

Novel Tree Edit Operations for RNA Secondary Structure Comparison

Julien Allali, Marie-France Sagot

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 412-425

We describe an algorithm for comparing two RNA secondary structures coded in the form of trees that introduces two novel operations, called node fusion and edge fusion, besides the tree edit operations of deletion, insertion and relabelling classically used in the literature. This allows us to address some serious limitations of the more traditional tree edit operations when the trees represent RNAs...

chapter

The Most Probable Labeling Problem in HMMs and Its Application to Bioinformatics

Broňa Brejová, Daniel G. Brown, Tomáš Vinař

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 426-437

Hidden Markov models (HMMs) are often used for biological sequence annotation. Each sequence element is represented by states with the same label. A sequence should be annotated with the labeling of highest probability. Computing this most probable labeling was shown NP-hard by Lyngsø and Pedersen [9]. We improve this result by proving the problem NP-hard for a fixed HMM. High probability labelings...

chapter

Integrating Sample-Driven and Pattern-Driven Approaches in Motif Finding

Sing-Hoi Sze, Songjian Lu, Jianer Chen

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 438-449

The problem of finding conserved motifs given a set of DNA sequences is among the most fundamental problems in computational biology, with important applications in locating regulatory sites from co-expressed genes. Traditionally, two classes of approaches are used to address the problem: sample-driven approaches focus on finding the locations of the motif instances directly, while pattern-driven...

chapter

Finding Optimal Pairs of Patterns

Hideo Bannai, Heikki Hyyrö, Ayumi Shinohara, Masayuki Takeda, more

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 450-462

We consider the problem of finding the optimal pair of string patterns for discriminating between two sets of strings, i.e. finding the pair of patterns that is best with respect to some appropriate scoring function that gives higher scores to pattern pairs which occur more in the strings of one set, but less in the other. We present an O(N ²) time algorithm for finding the optimal pair...

chapter

Finding Missing Patterns

Shunsuke Inenaga, Teemu Kivioja, Veli Mäkinen

Lecture Notes in Computer Science > Algorithms in Bioinformatics > Papers > 463-474

Consider the following problem: Find the shortest pattern that does not occur in a given text. To make the problem non-trivial, the pattern is required to consist only of characters that occur in the text. This problem can be solved easily in linear time using the suffix tree of the text. In this paper, we study an extension of this problem, namely the missing patterns problem: Find the shortest pair of patterns...

INFONA - science communication portal

Algorithms in Bioinformatics
4th International Workshop, WABI 2004, Bergen, Norway, September 17-21, 2004. Proceedings

Sequence Database Compression for Peptide Identification from Tandem Mass Spectra

Linear Reduction for Haplotype Inference

A New Integer Programming Formulation for the Pure Parsimony Problem in Haplotype Analysis

Fast Hare: A Fast Heuristic for Single Individual SNP Haplotype Reconstruction

Approximation Algorithms for the Selection of Robust Tag SNPs

The Minisatellite Transformation Problem Revisited: A Run Length Encoded Approach

A Faster and More Space-Efficient Algorithm for Inferring Arc-Annotations of RNA Sequences Through Alignment

New Algorithms for Multiple DNA Sequence Alignment

Chaining Algorithms for Alignment of Draft Sequence

Translation Initiation Sites Prediction with Mixture Gaussian Models

Online Consensus and Agreement of Phylogenetic Trees

Relation of Residues in the Variable Region of 16S rDNA Sequences and Their Relevance to Genus-Specificity

Topological Rearrangements and Local Search Method for Tandem Duplication Trees

Phylogenetic Super-networks from Partial Trees

Genome Identification and Classification by Short Oligo Arrays

Novel Tree Edit Operations for RNA Secondary Structure Comparison

The Most Probable Labeling Problem in HMMs and Its Application to Bioinformatics

Integrating Sample-Driven and Pattern-Driven Approaches in Motif Finding

Finding Optimal Pairs of Patterns

Finding Missing Patterns

Filter options

Publication date

Content availability

Publication language

Keywords

INFONA - science communication portal

Algorithms in Bioinformatics 4th International Workshop, WABI 2004, Bergen, Norway, September 17-21, 2004. Proceedings $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication language

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Algorithms in Bioinformatics
4th International Workshop, WABI 2004, Bergen, Norway, September 17-21, 2004. Proceedings