The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
The problem of sorting a signed permutation by reversals is inspired by genome rearrangements in computational molecular biology. Given two genomes represented as two signed permutations of the same elements (e.g. orthologous genes), the problem consists in finding a most parsimonious scenario of reversals that transforms one genome into the other. We propose a method for sorting a signed permutation...
The perfect phylogeny model for haplotype evolution has been successfully applied to haplotype resolution from genotype data. In this study we explore the application of the perfect phylogeny model to other problems in the design and analysis of genetic studies. We consider a novel type of data, xor-genotypes, which distinguish heterozygote from homozygote sites but do not identify the homozygote...
We consider the problem of sorting linear and circular permutations and 0/1 sequences by reversals in a length-sensitive cost model. We extend the results on sorting by length-weighted reversals in two directions: we consider the signed case for linear sequences and also the signed and unsigned cases for circular sequences. We give lower and upper bounds as well as guaranteed approximation ratios...
Optimized spaced seeds improve sensitivity and specificity in local homology search [1]. Recently, several authors [2-4] have shown that multiple seeds can have better sensitivity and specificity than single seeds. We describe a linear programming-based algorithm to optimize a set of seeds. Our algorithm offers a performance guarantee: the sensitivity of a chosen seed set is at least 70% of what can...
Given two undirected trees T and P, the Subtree Homeomorphism Problem is to find whether T has a subtree t that can be transformed into P by removing entire subtrees, as well as repeatedly removing a degree-2 node and adding the edge joining its two neighbors. In this paper we extend the Subtree Homeomorphism Problem to a new optimization problem by enriching the subtree-comparison with node-to-node...
In this paper we study the average behavior of the number of distinct substrings in a text of size n over an alphabet of cardinality k. This quantity is called the complexity index and it captures the “richness of the language” used in a sequence. For example, sequences with low complexity index contain a large number of repeated substrings and they eventually become periodic (e.g., tandem repeats...
The point set pattern matching problem is, given two sets “pattern” and “text” of points in Euclidean space, to find a linear transformation that maps the pattern to a subset of the text. We introduce an approximate point set pattern matching for axis-sorted point sequences that allows a translation, space insertions and deletions between points. We present an approximate pattern matching algorithm...
Given a matrix X composed of symbols, a bicluster is a submatrix of X obtained by removing some of the rows and some of the columns of X in such a way that each row of what is left reads the same string. In this paper, we are concerned with the problem of finding the bicluster with the largest area in a large matrix X. The problem is first proved to be NP-complete. We present a fast and efficient...
We study a problem of efficient utilisation of extra memory space in real-time string matching. We propose, for any constant ε >0, a real-time string matching algorithm claiming O(mε) extra space, where m is the size of a pattern. All previously known real-time string matching algorithms use Ω(m) extra space.
Given a set S ={s1,s2,...,sn } of strings each of length m, and an integer L, we study the following two problems. k -Closest Substring problem: find k center strings c1,c2 ,...,ck of length L minimizing d such that for each sj ∈...
We describe a new technique for reducing the number of nodes and symbols in automata based on tries. The technique stems from some results on anti-dictionaries for data compression and does not need to retain the input string, differently from other methods based on compact automata. The net effect is that of obtaining a lighter automaton than the directed acyclic word graph (DAWG) of Blumer et al...
We consider succinct, or highly space-efficient, representations of a (static) string consisting of n pairs of balanced parentheses, that support natural operations such as finding the matching parenthesis for a given parenthesis, or finding the pair of parentheses that most tightly enclose a given pair. This problem was considered by Jacobson, [Proc. 30th FOCS, 549–554, 1989] and Munro and Raman,...
The problem of aligning two sequences A and B to determine their similarity is one of the fundamental problems in pattern matching. A challenging, basic variation of the sequence similarity problem is the incremental string comparison problem, denoted Consecutive Suffix Alignment, which is, given two strings A and B, to compute the alignment solution of each suffix of A versus B. Here, we present...
We study the Submass Finding Problem: Given a string s over a weighted alphabet, i.e., an alphabet Σ with a weight function $\mu:\Sigma \to {\mathbb N}$ , decide for an input mass M whether s has a substring whose weights sum up to M. If M is indeed a submass, then we want to find one or all occurrences of such substrings. We present efficient algorithms for both the decision and the search problem...
Given a collection of trees on n leaves with identical leaf set, the Mast, resp. Mct, problem consists in finding a largest subset of leaves such that all input trees restricted to these leaves are isomorphic, resp. have a common refinement. For Mast, resp. Mct, on k rooted trees, we give an O(min{3pkn,2.27p+kn3}) exact algorithm,...
For a set of rooted, unordered, distinctly leaf-labeled trees, the NP-hard maximum agreement subtree problem (MAST) asks for a tree contained (up to isomorphism or homeomorphism) in all of the input trees with as many labeled leaves as possible. We study the ordered variants of MAST where the trees are uniformly or non-uniformly ordered. We provide the first known polynomial-time algorithms for the...
Phylogenetics is a science of determining connections between groups of organisms in terms of ancestor/descendent relationships, usually expressed by phylogenetic trees, also called “trees of life”, cladograms, or dendograms. In parsimony approach to reconstruct the phylogenetic trees, the goal is to find the most parsimonious tree, i.e., the tree requiring the smallest number/score of evolutionary...
In this paper we investigate the protein sequence design (PSD) problem (also known as the inverse protein folding problem) under the Canonical modelon 2D and 3D lattices [12,25]. The Canonical model is specified by (i) a geometric representation of a target protein structure with amino acid residues via its contact graph, (ii) a binary folding code in which the amino acids are classified as hydrophobic...
This paper addresses the problem of aligning multiple sequences of non-coding RNA genes. We approach this problem with the biologically motivated paradigm that scoring of ncRNA alignments should be based primarily on secondary structure rather than nucleotide conservation. We introduce a novel graph theoretic model (NLG) for analyzing algorithms based on this approach, prove that the RNA multiple...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.