The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
In this paper, we study lower bound techniques for branch-and-bound algorithms for maximum parsimony, with a focus on gene order data. We give a simple O(n3) time dynamic programming algorithm for computing the maximum circular ordering lower bound, where n is the number of leaves. The well-known gene order phylogeny program, GRAPPA, currently implements two heuristic approximations to...
We study the problem of phylogenetic reconstruction based on gene order for whole genomes. We define three genomic distances between whole genomes represented by signed sequences, based on the matching of similar segments of genes and on the notions of breakpoints, conserved intervals and common intervals. We use these distances and distance based phylogenetic reconstruction methods to compute a phylogeny...
Most genome rearrangement studies are based on the assumption that the compared genomes contain unique gene copies. This is clearly unsuitable for species with duplicated genes or when local alignment tools provide many ambiguous hits for the same gene. In this paper, we compare different measures of order conservation to select, among a gene family, the pair of copies in two genomes that best reflects...
We propose a detailed model of evolution of exon-intron structure of eukaryotic genes that takes into account gene-specific intron gain and loss rates, branch-specific gain and loss coefficients, invariant sites incapable of intron gain, and rate variability of both gain and loss which is gamma-distributed across sites. We develop an expectation-maximization algorithm to estimate the parameters of...
Whether common ancestors of eukaryotes and prokaryotes had introns is one of the oldest unanswered questions in molecular evolution. Recently completed genome sequences have been used for comprehensive analyses of exon-intron organization in orthologous genes of diverse organisms, leading to more refined work on intron evolution. Large sets of intron presence-absence data require rigorous theoretical...
The OMA project is a large-scale effort to identify groups of orthologs from complete genome data, currently 150 species. The algorithm relies solely on protein sequence information and does not require any human supervision. It has several original features, in particular a verification step that detects paralogs and prevents them from being clustered together. Consistency checks and verification...
There is widespread interest in comparative genomics in determining if historically and/or functionally related genes are spatially clustered in the genome, and whether the same sets of genes reappear in clusters in two or more genomes. We formalize and analyze the desirable properties of gene clusters and cluster definitions. Through detailed analysis of two commonly applied types of cluster, r-windows...
The String Barcoding (SBC) problem, introduced by Rash and Gusfield (RECOMB, 2002), consists in finding a minimum set of substrings that can be used to distinguish between all members of a set of given strings. In a computational biology context, the given strings represent a set of known viruses, while the substrings can be used as probes for an hybridization experiment via microarray. Eventually,...
In the half-century since the C-value paradox (the apparent lack of correlation between organismal genome size and morphological complexity) was described, there have been no explicit statistical comparisons between measures of genome size and organism complexity. It is reported here that there are significant positive correlations between measures of genome size and complexity with measures of non-hierarchical...
Identification of homologous chromosomal regions is important for understanding evolutionary processes that shape genome evolution, such as genome rearrangements and large scale duplication events. If these chromosomal regions have diverged significantly, statistical tests to determine whether observed similarities in gene content are due to history or chance are imperative. Currently available methods...
Gene cluster significance tests that are based on the number of genes in a cluster in two genomes, and how compactly they are distributed, but not their order, may be made more powerful by the addition of a test component that focuses solely on the similarity of the ordering of the common genes in the clusters in the two genomes. Here we suggest four such tests, compare them, and investigate one of...
The objective function of the genome rearrangement problems allows the integration of other genome-level problems so that they may be solved simultaneously. Three examples, all of which are hard: 1) Orientation assignment for unsigned genomes. 2 ) Ortholog identification in the presence of multiple copies of genes. 3) Linearisation of partially ordered genomes. The comparison of traditional genetic...
Asymmetric functional divergence of paralogues is a key aspect of the traditional model of evolution following duplication. If one gene continues to perform the ancestral function while the other copy evolves a new function then we might expect a period of accelerated sequence evolution following duplication in one of the copies. In keeping with this prediction, many individual examples of asymmetric...
Gene rearrangements have been used successfully in phylogenetic reconstruction and comparative genomics, but usually under the assumption that all genomes have the same gene content and that no gene is duplicated. While these assumptions allow one to work with organellar genomes, they are too restrictive for nuclear genomes. The main challenge in handling more realistic data is how to deal with gene...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.