Genome-Wide Comparative Gene Family Classification

被引:22
作者
Frech, Christian [1 ]
Chen, Nansheng [1 ]
机构
[1] Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada
来源
PLOS ONE | 2010年 / 5卷 / 10期
基金
加拿大自然科学与工程研究理事会;
关键词
CLUSTERING PROTEIN SEQUENCES; CAENORHABDITIS-ELEGANS; MODULAR ARCHITECTURE; CHEMORECEPTOR GENES; PHYLOGENETIC TREES; IDENTIFICATION; DATABASE; EVOLUTION; DUPLICATION; RECEPTORS;
D O I
10.1371/journal.pone.0013409
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species.
引用
收藏
页数:14
相关论文
共 63 条
  • [51] The Caenorhabditis chemoreceptor gene families
    Thomas, James H.
    Robertson, Hugh M.
    [J]. BMC BIOLOGY, 2008, 6 (1)
  • [52] Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains
    Thomas, JH
    [J]. GENETICS, 2006, 172 (01) : 127 - 143
  • [53] Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae
    Thomas, JH
    Kelley, JL
    Robertson, HM
    Ly, K
    Swanson, WJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (12) : 4476 - 4481
  • [54] CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE
    THOMPSON, JD
    HIGGINS, DG
    GIBSON, TJ
    [J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (22) : 4673 - 4680
  • [55] DIVERGENT 7 TRANSMEMBRANE RECEPTORS ARE CANDIDATE CHEMOSENSORY RECEPTORS IN C-ELEGANS
    TROEMEL, ER
    CHOU, JH
    DWYER, ND
    COLBERT, HA
    BARGMANN, CI
    [J]. CELL, 1995, 83 (02) : 207 - 218
  • [56] The sequence of the human genome
    Venter, JC
    Adams, MD
    Myers, EW
    Li, PW
    Mural, RJ
    Sutton, GG
    Smith, HO
    Yandell, M
    Evans, CA
    Holt, RA
    Gocayne, JD
    Amanatides, P
    Ballew, RM
    Huson, DH
    Wortman, JR
    Zhang, Q
    Kodira, CD
    Zheng, XQH
    Chen, L
    Skupski, M
    Subramanian, G
    Thomas, PD
    Zhang, JH
    Miklos, GLG
    Nelson, C
    Broder, S
    Clark, AG
    Nadeau, C
    McKusick, VA
    Zinder, N
    Levine, AJ
    Roberts, RJ
    Simon, M
    Slayman, C
    Hunkapiller, M
    Bolanos, R
    Delcher, A
    Dew, I
    Fasulo, D
    Flanigan, M
    Florea, L
    Halpern, A
    Hannenhalli, S
    Kravitz, S
    Levy, S
    Mobarry, C
    Reinert, K
    Remington, K
    Abu-Threideh, J
    Beasley, E
    [J]. SCIENCE, 2001, 291 (5507) : 1304 - +
  • [57] EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates
    Vilella, Albert J.
    Severin, Jessica
    Ureta-Vidal, Abel
    Heng, Li
    Durbin, Richard
    Birney, Ewan
    [J]. GENOME RESEARCH, 2009, 19 (02) : 327 - 335
  • [58] PlantTribes:: a gene and gene family resource for comparative genomics in plants
    Wall, P. Kerr
    Leebens-Mack, Jim
    Mueller, Kai F.
    Field, Dawn
    Altman, Naomi S.
    dePamphilis, Claude W.
    [J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D970 - D976
  • [59] Large scale clustering of protein sequences with FORCE - A layout based heuristic for weighted cluster editing
    Wittkop, Tobias
    Baumbach, Jan
    Lobo, Francisco P.
    Rahmann, Sven
    [J]. BMC BIOINFORMATICS, 2007, 8 (1)
  • [60] Protein family classification and functional annotation
    Wu, CH
    Huang, HZ
    Yeh, LSL
    Barker, WC
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2003, 27 (01) : 37 - 47