A simple algorithm to infer gene duplication and speciation events on a gene tree

被引:143
作者
Zmasek, CM [1 ]
Eddy, SR [1 ]
机构
[1] Washington Univ, Dept Genet, Howard Hughes Med Inst, Sch Med, St Louis, MO 63110 USA
关键词
D O I
10.1093/bioinformatics/17.9.821
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: When analyzing protein sequences using sequence similarity searches, orthologous sequences (that diverged by speciation) are more reliable predictors of a new protein's function than paralogous sequences (that diverged by gene duplication), because duplication enables functional diversification. The utility of phylogenetic information in high-throughput genome annotation ('phylogenomics') is widely recognized, but existing approaches are either manual or indirect (e.g. not based on phylogenetic trees). Our goal is to automate phylogenomics using explicit phylogenetic inference. A necessary component is an algorithm to infer speciation and duplication events in a given gene tree. Results: We give an algorithm to infer speciation and duplication events on a gene tree by comparison to a trusted species tree. This algorithm has a worst-case running time of O(n(2)) which is inferior to two previous algorithms that are similar toO(n) for a gene tree of n sequences. However, our algorithm is extremely simple, and its asymptotic worst case behavior is only realized on pathological data sets. We show empirically, using 1750 gene trees constructed from the Pfam protein family database, that it appears to be a practical (and often superior) algorithm for analyzing real gene trees.
引用
收藏
页码:821 / 828
页数:8
相关论文
共 37 条
  • [1] Evidence for a clade of nematodes, arthropods and other moulting animals
    Aguinaldo, AMA
    Turbeville, JM
    Linford, LS
    Rivera, MC
    Garey, JR
    Raff, RA
    Lake, JA
    [J]. NATURE, 1997, 387 (6632) : 489 - 493
  • [2] ALTSCHUL SF, 1990, J MOL BIOL, V215, P403, DOI 10.1006/jmbi.1990.9999
  • [3] APWEILER R, 2000, INTERPRO INTEGRATED, V10
  • [4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [5] SPACING DIFFERENTIATION IN THE DEVELOPING DROSOPHILA EYE - A FIBRINOGEN-RELATED LATERAL INHIBITOR ENCODED BY SCABROUS
    BAKER, NE
    MLODZIK, M
    RUBIN, GM
    [J]. SCIENCE, 1990, 250 (4986) : 1370 - 1377
  • [6] Perspectives on archaeal diversity, thermophily and monophyly from environmental rRNA sequences
    Barns, SM
    Delwiche, CF
    Palmer, JD
    Pace, NR
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1996, 93 (17) : 9188 - 9193
  • [7] Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkh121, 10.1093/nar/gkr1065]
  • [8] Chen K., 2000, RECOMB 2000. Proceedings of the Fourth Annual International Conference on Computational Molecular Biology, P96, DOI 10.1145/332306.332351
  • [9] Cormen T. H., 1990, INTRO ALGORITHMS
  • [10] Dayhoff M.O., 1978, ATLAS PROTEIN SEQ ST, V5