Comparing Genomes with duplications: A computational complexity point of view

被引:21
作者
Blin, Guillaume [1 ]
Chauve, Cedric
Fertin, Guillaume
Rizzi, Romeo
Vialette, Stephane
机构
[1] Univ Marne La Vallee, CNRS, UMR 8049, IGM Labinfo, F-77454 Marne La Vallee 2, France
[2] Simon Fraser Univ, Dept Math, Burnaby, BC V5A 1S6, Canada
[3] Univ Nantes, CNRS, FRE 2729, LINA, F-44322 Nantes 3, France
[4] Univ Udine, Dipartimento Matemat & Informat, I-33100 Udine, Italy
[5] Univ Paris 11, Fac Sci, CNRS, UMR 8623,LRI, F-91405 Orsay, France
关键词
comparative genomics; computational complexity; common intervals; maximum adjacency disruption number; summed adjacency disruption number;
D O I
10.1109/TCBB.2007.1069
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In this paper, we are interested in the computational complexity of computing (dis) similarity measures between two genomes when they contain duplicated genes or genomic markers, a problem that happens frequently when comparing whole nuclear genomes. Recently, several methods [1], [2] have been proposed that are based on two steps to compute a given (dis) similarity measure M between two genomes G(1) and G(2): First, one establishes a one-to-one correspondence between the genes of G1 and the genes of G2; second, once this correspondence is established, it explicitly defines a permutation and it is then possible to quantify their similarity using classical measures defined for permutations like the number of breakpoints. Hence, these methods rely on two elements: a way to establish a one-to-one correspondence between genes of a pair of genomes and a (dis) similarity measure for permutations. The problem is then, given a (dis) imilarity measure for permutations, compute a correspondence that defines an optimal permutation for this measure. We are interested here in two models to compute a one-to-one correspondence: the exemplar model, where all but one copy is deleted in both genomes for each gene family, and the matching model, which computes a maximal correspondence for each gene family. We show that, for these two models and for three (dis) similarity measures on permutations, namely, the number of common intervals, the maximum adjacency disruption (MAD) number, and the summed adjacency disruption (SAD) number, the problem of computing an optimal correspondence is NP-complete and even APX-hard for the MAD number and the SAD number.
引用
收藏
页码:523 / 534
页数:12
相关论文
共 28 条
[1]  
ANGIBAUD S, 2006, INAPPROXIMABILITY SI
[2]  
Angibaud S, 2006, LECT NOTES COMPUT SC, V4205, P75
[3]   Genome rearrangement distances and gene order phylogeny in γ-proteobacteria [J].
Belda, E ;
Moya, A ;
Silva, FJ .
MOLECULAR BIOLOGY AND EVOLUTION, 2005, 22 (06) :1456-1467
[4]  
BERGERON A, 2003, P 9 ANN INT C COMP C, P68
[5]  
Blin G, 2005, LECT NOTES COMPUT SC, V3595, P22, DOI 10.1007/11533719_5
[6]  
BLIN G, 2006, P RECOMB INT WORKSH, P24
[7]  
Blin G., 2004, P 1 C ALGORITHMS COM
[8]   Comparative architectures of mammalian and chicken genomes reveal highly variable rates of genomic rearrangements across different lineages [J].
Bourque, G ;
Zdobnov, EM ;
Bork, P ;
Pevzner, PA ;
Tesler, G .
GENOME RESEARCH, 2005, 15 (01) :98-110
[9]  
BOURQUE G, 2005, P 3 RECOMB SAT WORKS, P21
[10]  
Bryant D., 2000, Comparative Genomics: Empirical and Analytical Approaches to Gene Order Dynamics, Map Alignment, and the Evolution of Gene Families, DOI DOI 10.1007/978-94-011-4309-7