共 63 条
Genome-Wide Comparative Gene Family Classification
被引:22
作者:

Frech, Christian
论文数: 0 引用数: 0
h-index: 0
机构:
Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada

Chen, Nansheng
论文数: 0 引用数: 0
h-index: 0
机构:
Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada
机构:
[1] Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada
来源:
PLOS ONE
|
2010年
/
5卷
/
10期
基金:
加拿大自然科学与工程研究理事会;
关键词:
CLUSTERING PROTEIN SEQUENCES;
CAENORHABDITIS-ELEGANS;
MODULAR ARCHITECTURE;
CHEMORECEPTOR GENES;
PHYLOGENETIC TREES;
IDENTIFICATION;
DATABASE;
EVOLUTION;
DUPLICATION;
RECEPTORS;
D O I:
10.1371/journal.pone.0013409
中图分类号:
O [数理科学和化学];
P [天文学、地球科学];
Q [生物科学];
N [自然科学总论];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species.
引用
收藏
页数:14
相关论文
共 63 条
- [51] The Caenorhabditis chemoreceptor gene families[J]. BMC BIOLOGY, 2008, 6 (1)Thomas, James H.论文数: 0 引用数: 0 h-index: 0机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA Univ Washington, Dept Genome Sci, Seattle, WA 98195 USARobertson, Hugh M.论文数: 0 引用数: 0 h-index: 0机构: Univ Illinois, Dept Entomol, Urbana, IL 61801 USA Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
- [52] Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains[J]. GENETICS, 2006, 172 (01) : 127 - 143Thomas, JH论文数: 0 引用数: 0 h-index: 0机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
- [53] Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (12) : 4476 - 4481Thomas, JH论文数: 0 引用数: 0 h-index: 0机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USAKelley, JL论文数: 0 引用数: 0 h-index: 0机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USARobertson, HM论文数: 0 引用数: 0 h-index: 0机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USALy, K论文数: 0 引用数: 0 h-index: 0机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USASwanson, WJ论文数: 0 引用数: 0 h-index: 0机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
- [54] CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE[J]. NUCLEIC ACIDS RESEARCH, 1994, 22 (22) : 4673 - 4680THOMPSON, JD论文数: 0 引用数: 0 h-index: 0机构: EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANYHIGGINS, DG论文数: 0 引用数: 0 h-index: 0机构: EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANYGIBSON, TJ论文数: 0 引用数: 0 h-index: 0机构: EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY
- [55] DIVERGENT 7 TRANSMEMBRANE RECEPTORS ARE CANDIDATE CHEMOSENSORY RECEPTORS IN C-ELEGANS[J]. CELL, 1995, 83 (02) : 207 - 218TROEMEL, ER论文数: 0 引用数: 0 h-index: 0机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USACHOU, JH论文数: 0 引用数: 0 h-index: 0机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USADWYER, ND论文数: 0 引用数: 0 h-index: 0机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USACOLBERT, HA论文数: 0 引用数: 0 h-index: 0机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USABARGMANN, CI论文数: 0 引用数: 0 h-index: 0机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USA
- [56] The sequence of the human genome[J]. SCIENCE, 2001, 291 (5507) : 1304 - +Venter, JC论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAAdams, MD论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAMyers, EW论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USALi, PW论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAMural, RJ论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USASutton, GG论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USASmith, HO论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAYandell, M论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAEvans, CA论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAHolt, RA论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAGocayne, JD论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAAmanatides, P论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USABallew, RM论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAHuson, DH论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAWortman, JR论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAZhang, Q论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAKodira, CD论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAZheng, XQH论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAChen, L论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USASkupski, M论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USASubramanian, G论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAThomas, PD论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAZhang, JH论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAMiklos, GLG论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USANelson, C论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USABroder, S论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAClark, AG论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USANadeau, C论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAMcKusick, VA论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAZinder, N论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USALevine, AJ论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USARoberts, RJ论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USASimon, M论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USASlayman, C论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAHunkapiller, M论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USABolanos, R论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USADelcher, A论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USADew, I论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAFasulo, D论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAFlanigan, M论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAFlorea, L论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAHalpern, A论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAHannenhalli, S论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAKravitz, S论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USALevy, S论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAMobarry, C论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAReinert, K论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USARemington, K论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USAAbu-Threideh, J论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USABeasley, E论文数: 0 引用数: 0 h-index: 0机构: Celera Genom, Rockville, MD 20850 USA
- [57] EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates[J]. GENOME RESEARCH, 2009, 19 (02) : 327 - 335Vilella, Albert J.论文数: 0 引用数: 0 h-index: 0机构: EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, EnglandSeverin, Jessica论文数: 0 引用数: 0 h-index: 0机构: EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, EnglandUreta-Vidal, Abel论文数: 0 引用数: 0 h-index: 0机构: EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, EnglandHeng, Li论文数: 0 引用数: 0 h-index: 0机构: Wellcome Trust Sanger Inst, Cambridge CB10 1HH, England EMBL EBI, Cambridge CB10 1SD, EnglandDurbin, Richard论文数: 0 引用数: 0 h-index: 0机构: Wellcome Trust Sanger Inst, Cambridge CB10 1HH, England EMBL EBI, Cambridge CB10 1SD, EnglandBirney, Ewan论文数: 0 引用数: 0 h-index: 0机构: EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, England
- [58] PlantTribes:: a gene and gene family resource for comparative genomics in plants[J]. NUCLEIC ACIDS RESEARCH, 2008, 36 : D970 - D976Wall, P. Kerr论文数: 0 引用数: 0 h-index: 0机构: Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USALeebens-Mack, Jim论文数: 0 引用数: 0 h-index: 0机构: Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA Univ Georgia, Dept Plant Biol, Athens, GA 30602 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USAMueller, Kai F.论文数: 0 引用数: 0 h-index: 0机构: Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA Univ Bonn, Nees Inst Biodivers Plants, D-53115 Bonn, Germany Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USAField, Dawn论文数: 0 引用数: 0 h-index: 0机构: NERC, Ctr Ecol & Hydrol, Mol Evolut & Bioinformat Grp, Oxford OX1 3SR, England Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USAAltman, Naomi S.论文数: 0 引用数: 0 h-index: 0机构: Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA Penn State Univ, Dept Stat, University Pk, PA 16802 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USAdePamphilis, Claude W.论文数: 0 引用数: 0 h-index: 0机构: Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA
- [59] Large scale clustering of protein sequences with FORCE - A layout based heuristic for weighted cluster editing[J]. BMC BIOINFORMATICS, 2007, 8 (1)Wittkop, Tobias论文数: 0 引用数: 0 h-index: 0机构: Univ Bielefeld, Bielefeld, Germany Univ Bielefeld, DFG Graduiertenkolleg Bioinformat, Bielefeld, Germany Univ Bielefeld, Bielefeld, GermanyBaumbach, Jan论文数: 0 引用数: 0 h-index: 0机构: Univ Bielefeld, Bielefeld, Germany Ctr Biotechnol, Int Grad Sch Bioinformat & Genome Res, Bielefeld, Germany Univ Bielefeld, Bielefeld, GermanyLobo, Francisco P.论文数: 0 引用数: 0 h-index: 0机构: Univ Bielefeld, Bielefeld, Germany Univ Fed Minas Gerais, Lab Genet Bioquim, Belo Horizonte, MG, Brazil Univ Bielefeld, Bielefeld, GermanyRahmann, Sven论文数: 0 引用数: 0 h-index: 0机构: Univ Dortmund, D-44221 Dortmund, Germany Univ Bielefeld, Bielefeld, Germany
- [60] Protein family classification and functional annotation[J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2003, 27 (01) : 37 - 47Wu, CH论文数: 0 引用数: 0 h-index: 0机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USAHuang, HZ论文数: 0 引用数: 0 h-index: 0机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USAYeh, LSL论文数: 0 引用数: 0 h-index: 0机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USABarker, WC论文数: 0 引用数: 0 h-index: 0机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USA