Tree pattern matching in phylogenetic trees:: automatic search for orthologs or paralogs in homologous gene sequence databases

被引:122
作者
Dufayard, JF
Duret, L
Penel, S
Gouy, M
Rechenmann, F
Perrière, G
机构
[1] Univ Lyon 1, CNRS, UMR 5558, Lab Biometrie & Biol Evolut, F-69688 Villeurbanne, France
[2] INRIA Rhone Alpes, Montbonnot St Martin, St Ismier, France
关键词
D O I
10.1093/bioinformatics/bti325
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Comparative sequence analysis is widely used to study genome function and evolution. This approach first requires the identification of homologous genes and then the interpretation of their homology relationships (orthology or paralogy). To provide help in this complex task, we developed three databases of homologous genes containing sequences, multiple alignments and phylogenetic trees: HOBACGEN, HOVERGEN and HOGENOM. In this paper, we present two new tools for automating the search for orthologs or paralogs in these databases. Results: First, we have developed and implemented an algorithm to infer speciation and duplication events by comparison of gene and species trees (tree reconciliation). Second, we have developed a general method to search in our databases the gene families for which the tree topology matches a peculiar tree pattern. This algorithm of unordered tree pattern matching has been implemented in the FamFetch graphical interface. With the help of a graphical editor, the user can specify the topology of the tree pattern, and set constraints on its nodes and leaves. Then, this pattern is compared with all the phylogenetic trees of the database, to retrieve the families in which one or several occurrences of this pattern are found. By specifying ad hoc patterns, it is therefore possible to identify orthologs in our databases.
引用
收藏
页码:2596 / 2603
页数:8
相关论文
共 23 条
  • [1] CODE GENERATION USING TREE MATCHING AND DYNAMIC-PROGRAMMING
    AHO, AV
    GANAPATHI, M
    TJIANG, SWK
    [J]. ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 1989, 11 (04): : 491 - 516
  • [2] Gapped BLAST and PSI-BLAST: a new generation of protein database search programs
    Altschul, SF
    Madden, TL
    Schaffer, AA
    Zhang, JH
    Zhang, Z
    Miller, W
    Lipman, DJ
    [J]. NUCLEIC ACIDS RESEARCH, 1997, 25 (17) : 3389 - 3402
  • [3] Phylogenetics and the cohesion of bacterial genomes
    Daubin, V
    Moran, NA
    Ochman, H
    [J]. SCIENCE, 2003, 301 (5634) : 829 - 832
  • [4] A phylogenomic approach to bacterial phylogeny:: Evidence of a core of genes sharing a common history
    Daubin, V
    Gouy, M
    Perrière, G
    [J]. GENOME RESEARCH, 2002, 12 (07) : 1080 - 1090
  • [5] DURET L, 1999, BIOINFORMATICS DATAB, P13
  • [6] Duplication-based measures of difference between gene and species trees
    Eulenstein, O
    Mirkin, B
    Vingron, M
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1998, 5 (01) : 135 - 148
  • [7] Horizontal gene transfer in bacterial and archaeal complete genomes
    Garcia-Vallvé, S
    Romeu, A
    Palau, J
    [J]. GENOME RESEARCH, 2000, 10 (11) : 1719 - 1725
  • [8] KILPELAINEN P, 1993, P 16 ANN INT ACM SIG, P214
  • [9] Kilpelainen P, 1992, TREE MATCHING PROBLE
  • [10] Horizontal gene transfer in prokaryotes: Quantification and classification
    Koonin, EV
    Makarova, KS
    Aravind, L
    [J]. ANNUAL REVIEW OF MICROBIOLOGY, 2001, 55 : 709 - 742