Combinatorial Pattern Matching

chapter

Explaining and Controlling Ambiguity in Dynamic Programming

Robert Giegerich

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 46-59

Ambiguity in dynamic programming arises from two independent sources, the non-uniqueness of optimal solutions and the particular recursion scheme by which the search space is evaluated. Ambiguity, unless explicitly considered, leads to unnecessarily complicated, inflexible, and sometimes even incorrect dynamic programming algorithms. Building upon the recently developed algebraic approach to dynamic...

chapter

A Dynamic Edit Distance Table

Sung -Ryul Kim, Kunsoo Park

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 60-68

In this paper we consider the incremental/decremental version of the edit distance problem: given a solution to the edit distance between two strings A and B, find a solution to the edit distance between A and B′ where B′ = aB (incremental) or ⊂decremental). As a solution for the edit distance between A and B, we define the difference representation of the D-table, which leads to a simple and intuitive...

chapter

Parametric Multiple Sequence Alignment and Phylogeny Construction

David Fernández-Baca, Timo Seppäläinen, Giora Slutzki

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 69-83

Bounds are given on the size of the parameter-space decomposition induced by multiple sequence alignment problems where phylogenetic information may be given or inferred. It is shown that many of the usual formulations of these problems fall within the same integer parametric framework, implying that the number of distinct optima obtained as the parameters are varied across their ranges is polynomially...

chapter

Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Sequence Alignment

Paul Horton

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 84-98

In this paper we present a branch and bound algorithm for local gapless multiple sequence alignment (motif alignment) and its implementation. This is the first program to exploit the fact that the motif alignment problem is easier for short motifs. Indeed for a fixed motif width the running time of the algorithm is asymptotically linear in the size of the input. We tested the performance of the program...

chapter

A Polynomial Time Approximation Scheme for the Closest Substring Problem

Bin Ma

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 99-107

In this paper we study the following problem: Given n strings s₁, s ₂,..., s_n, each of length m, find a substring t_i of length L for each s _i, and a string s of length L, such that max_i = 1ⁿ d(s, t_i) is minimized, where d(·, ·) is the Hamming distance. The problem was raised in [6] in an application of genetic drug target search and is a key open problem in many applications...

chapter

Approximation Algorithms for Hamming Clustering Problems

Leszek Gąasieniec, Jesper Jansson, Andrzej Lingas

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 108-118

We study Hamming versions of two classical clustering problems. The Hamming radius p-clustering problem (HRC) for a set S of k binary strings, each of length n, is to find p binary strings of length n that minimize the maximum Hamming distance between a string in S and the closest of the p strings; this minimum value is termed the p-radius of S and is denoted by ϱ. The related Hamming diameter p-clustering...

chapter

Approximating the Maximum Isomorphic Agreement Subtree Is Hard

Paola Bonizzoni, Gianluca Vedova, Giancarlo Mauri

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 119-128

The Maximum Isomorphic Agreement Subtree (MIT) problem is one of the simplest versions of the Maximum Interval Weight Agreement Subtree method (MIWT) which is used to compare phylogenies. More precisely MIT allows to provide a subset of the species such that the exact distances between species in such subset is preserved among all evolutionary trees considered. In this paper, the approximation complexity...

chapter

A Faster and Unifying Algorithm for Comparing Trees

Ming -Yang Kao, Tak -Wah Lam, Wing -Kin Sung, Hing -Fung Ting

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 129-142

A widely-used method for determining the similarity of two labeled trees is to compute a maximum agreement subtree of the two trees. Previous work on this similarity measure only concerns with the comparison of labeled trees of two special kinds, namely, uniformly labeled trees (i.e., trees with all their nodes labeled by the same symbol) and evolutionary trees (i.e., leaf-labeled trees with distinct...

chapter

Incomplete Directed Perfect Phylogeny

Itsik Pe’er, Ron Shamir, Roded Sharan

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 143-153

Perfect phylogeny is one of the fundamental models for studying evolution. We investigate the following generalization of the problem: The input is a species-characters matrix. The characters are binary and directed, i.e., a species can only gain characters. The difference from standard perfect phylogeny is that for some species the state of some characters is unknown. The question is whether one...

chapter

The Longest Common Subsequence Problem for Arc-Annotated Sequences

Tao Jiang, Guo-Hui Lin, Bin Ma, Kaizhong Zhang

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 154-165

Arc-annotated sequences are useful in representing the structural information of RNA and protein sequences. Recently, the longest arc-preserving common subsequence problem has been introduced in [[6],[7]] as a framework for studying the similarity of arc-annotated sequences. In this paper, we consider arc-annotated sequences with various arc structures and present some new algorithmic and complexity...

chapter

Boyer—Moore String Matching over Ziv-Lempel Compressed Text

Gonzalo Navarro, Jorma Tarhio

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 166-180

We present a Boyer-Moore approach to string matching over LZ78 and LZW compressed text. The key idea is that, despite that we cannot exactly choose which text characters to inspect, we can still use the characters explicitly represented in those formats to shift the pattern in the text. We present a basic approach and more advanced ones. Despite that the theoretical average complexity does not improve...

chapter

A Boyer—Moore Type Algorithm for Compressed Pattern Matching

Yusuke Shibata, Tetsuya Matsumoto, Masayuki Takeda, Ayumi Shinohara, more

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 181-194

We apply the Boyer-Moore technique to compressed pattern matching for text string described in terms of collage system, which is a formal framework that captures various dictionary-based compression methods. For a subclass of collage systems that contain no truncation, our new algorithm runs in O(‖D‖ + n. m + m² + r) time using O(‖D‖ + m²) space, where ‖D‖ is the size of dictionary D, n is the compressed...

chapter

Approximate String Matching over Ziv—Lempel Compressed Text

Juha Kärkkäinen, Gonzalo Navarro, Esko Ukkonen

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 195-209

We present a solution to the problem of performing approximate pattern matching on compressed text. The format we choose is the Ziv-Lempel family, specifically the LZ78 and LZW variants. Given a text of length u compressed into length n, and a pattern of length m, we report all the R occurrences of the pattern in the text allowing up to insertions, deletions and substitutions, in O(mkn + R) time....

chapter

Improving Static Compression Schemes by Alphabet Extension

Shmuel T. Klein

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 210-221

The performance of data compression on a large static text may be improved if certain variable-length strings are included in the character set for which a code is generated. A new method for extending the alphabet is presented, based on a reduction to a graph-theoretic problem. A related optimization problem is shown to be NP-complete, a fast heuristic is suggested, and experimental results are presented.

chapter

Genome Rearrangement by Reversals and Insertions/Deletions of Contiguous Segments

Nadia El-Mabrouk

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 222-234

Analysis of genome rearrangements allows to compare molecular data from species that diverged a very long time ago. Results and complexities are tightly related to the type of data and genome-level mutations considered. For sorted and signed data, Hannenhalli and Pevzner (HP) developed the first polynomial algorithm in the field. This algorithm solves the problem of sorting by reversals. In this paper,...

chapter

A Lower Bound for the Breakpoint Phylogeny Problem

David Bryant

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 235-247

Breakpoint phylogenies methods have been shown to be an effective way to extract phylogenetic information from gene order data. Currently, the only practical breakpoint phylogeny algorithms for the analysis of large genomes with varied gene content are heuristics with no optimality guarantee. Here we address this shortcoming by describing new bounds for the breakpoint median problem, and for the more...

chapter

Structural Properties and Tractability Results for Linear Synteny

David Liben-Nowell, Jon Kleinberg

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 248-263

The syntenic distance between two species is the minimum number of fusions, fissions, and translocations required to transform one genome into the other. The linear syntenic distance, a restricted form of this model, has been shown to be close to the syntenic distance. Both models are computationally difficult to compute and have resisted efficient approximation algorithms with non-trivial performance...

chapter

Shift Error Detection in Standardized Exams

Steven Skiena, Pavel Sumazin

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 264-276

Computer-graded multiple choice examinations are a familiar and dreaded part of most student’s lives. Many test takers are particularly fearful of form-filling shift errors, where absent-mindedly marking the answer to (say) question 32 in position 31 causes a long run of answers to be successively displaced. Test-taking strategies where students answer questions out of sequence (such as answering...

chapter

An Upper Bound for Number of Contacts in the HP-Model on the Face-Centered-Cubic Lattice (FCC)

Rolf Backofen

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 277-292

Lattice protein models are a major tool for investigating principles of protein folding. For this purpose, one needs an algorithm that is guaranteed to find the minimal energy conformation in some lattice model (at least for some sequences). So far, there are only algorithm that can find optimal conformations in the cubic lattice. In the more interesting case of the face-centered-cubic lattice (FCC),...

chapter

The Combinatorial Partitioning Method

Matthew R. Nelson, Sharon L. Kardia, Charles F. Sing

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 293-304

Recent advances in genome technology have led to an exponential increase in the ability to identify and measure variation in a large number of genes in the human genome. However, statistical and computational methods to utilize this information on hundreds, and soon thousands, of variable DNA sites to investigate genotype-phenotype relationships have not kept pace. Because genotype-phenotype relationships...

INFONA - science communication portal

Combinatorial Pattern Matching
11th Annual Symposium, CPM 2000 Montreal, Canada, June 21–23, 2000 Proceedings

Explaining and Controlling Ambiguity in Dynamic Programming

A Dynamic Edit Distance Table

Parametric Multiple Sequence Alignment and Phylogeny Construction

Tsukuba BB: A Branch and Bound Algorithm for Local Multiple Sequence Alignment

A Polynomial Time Approximation Scheme for the Closest Substring Problem

Approximation Algorithms for Hamming Clustering Problems

Approximating the Maximum Isomorphic Agreement Subtree Is Hard

A Faster and Unifying Algorithm for Comparing Trees

Incomplete Directed Perfect Phylogeny

The Longest Common Subsequence Problem for Arc-Annotated Sequences

Boyer—Moore String Matching over Ziv-Lempel Compressed Text

A Boyer—Moore Type Algorithm for Compressed Pattern Matching

Approximate String Matching over Ziv—Lempel Compressed Text

Improving Static Compression Schemes by Alphabet Extension

Genome Rearrangement by Reversals and Insertions/Deletions of Contiguous Segments

A Lower Bound for the Breakpoint Phylogeny Problem

Structural Properties and Tractability Results for Linear Synteny

Shift Error Detection in Standardized Exams

An Upper Bound for Number of Contacts in the HP-Model on the Face-Centered-Cubic Lattice (FCC)

The Combinatorial Partitioning Method

Filter options

Publication date

Content availability

Publication language

Keywords

INFONA - science communication portal

Combinatorial Pattern Matching 11th Annual Symposium, CPM 2000 Montreal, Canada, June 21–23, 2000 Proceedings $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Content availability

Publication language

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Combinatorial Pattern Matching
11th Annual Symposium, CPM 2000 Montreal, Canada, June 21–23, 2000 Proceedings