共 63 条
Genome-Wide Comparative Gene Family Classification
被引:22
作者:

Frech, Christian
论文数: 0 引用数: 0
h-index: 0
机构:
Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada

Chen, Nansheng
论文数: 0 引用数: 0
h-index: 0
机构:
Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada
机构:
[1] Simon Fraser Univ, Dept Mol Biol & Biochem, Burnaby, BC V5A 1S6, Canada
来源:
基金:
加拿大自然科学与工程研究理事会;
关键词:
CLUSTERING PROTEIN SEQUENCES;
CAENORHABDITIS-ELEGANS;
MODULAR ARCHITECTURE;
CHEMORECEPTOR GENES;
PHYLOGENETIC TREES;
IDENTIFICATION;
DATABASE;
EVOLUTION;
DUPLICATION;
RECEPTORS;
D O I:
10.1371/journal.pone.0013409
中图分类号:
O [数理科学和化学];
P [天文学、地球科学];
Q [生物科学];
N [自然科学总论];
学科分类号:
07 ;
0710 ;
09 ;
摘要:
Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species.
引用
收藏
页数:14
相关论文
共 63 条
[51]
The Caenorhabditis chemoreceptor gene families
[J].
Thomas, James H.
;
Robertson, Hugh M.
.
BMC BIOLOGY,
2008, 6 (1)

Thomas, James H.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA

Robertson, Hugh M.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Illinois, Dept Entomol, Urbana, IL 61801 USA Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[52]
Analysis of homologous gene clusters in Caenorhabditis elegans reveals striking regional cluster domains
[J].
Thomas, JH
.
GENETICS,
2006, 172 (01)
:127-143

Thomas, JH
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[53]
Adaptive evolution in the SRZ chemoreceptor families of Caenorhabditis elegans and Caenorhabditis briggsae
[J].
Thomas, JH
;
Kelley, JL
;
Robertson, HM
;
Ly, K
;
Swanson, WJ
.
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA,
2005, 102 (12)
:4476-4481

Thomas, JH
论文数: 0 引用数: 0
h-index: 0
机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA

Kelley, JL
论文数: 0 引用数: 0
h-index: 0
机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA

Robertson, HM
论文数: 0 引用数: 0
h-index: 0
机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA

Ly, K
论文数: 0 引用数: 0
h-index: 0
机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA

Swanson, WJ
论文数: 0 引用数: 0
h-index: 0
机构: Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[54]
CLUSTAL-W - IMPROVING THE SENSITIVITY OF PROGRESSIVE MULTIPLE SEQUENCE ALIGNMENT THROUGH SEQUENCE WEIGHTING, POSITION-SPECIFIC GAP PENALTIES AND WEIGHT MATRIX CHOICE
[J].
THOMPSON, JD
;
HIGGINS, DG
;
GIBSON, TJ
.
NUCLEIC ACIDS RESEARCH,
1994, 22 (22)
:4673-4680

THOMPSON, JD
论文数: 0 引用数: 0
h-index: 0
机构:
EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY

HIGGINS, DG
论文数: 0 引用数: 0
h-index: 0
机构:
EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY

GIBSON, TJ
论文数: 0 引用数: 0
h-index: 0
机构:
EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY EUROPEAN MOLEC BIOL LAB,D-69012 HEIDELBERG,GERMANY
[55]
DIVERGENT 7 TRANSMEMBRANE RECEPTORS ARE CANDIDATE CHEMOSENSORY RECEPTORS IN C-ELEGANS
[J].
TROEMEL, ER
;
CHOU, JH
;
DWYER, ND
;
COLBERT, HA
;
BARGMANN, CI
.
CELL,
1995, 83 (02)
:207-218

TROEMEL, ER
论文数: 0 引用数: 0
h-index: 0
机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USA

CHOU, JH
论文数: 0 引用数: 0
h-index: 0
机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USA

DWYER, ND
论文数: 0 引用数: 0
h-index: 0
机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USA

COLBERT, HA
论文数: 0 引用数: 0
h-index: 0
机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USA

BARGMANN, CI
论文数: 0 引用数: 0
h-index: 0
机构: UNIV CALIF SAN FRANCISCO, HOWARD HUGHES MED INST, DEPT ANAT, PROGRAM NEUROSCI, SAN FRANCISCO, CA 94143 USA
[56]
The sequence of the human genome
[J].
Venter, JC
;
Adams, MD
;
Myers, EW
;
Li, PW
;
Mural, RJ
;
Sutton, GG
;
Smith, HO
;
Yandell, M
;
Evans, CA
;
Holt, RA
;
Gocayne, JD
;
Amanatides, P
;
Ballew, RM
;
Huson, DH
;
Wortman, JR
;
Zhang, Q
;
Kodira, CD
;
Zheng, XQH
;
Chen, L
;
Skupski, M
;
Subramanian, G
;
Thomas, PD
;
Zhang, JH
;
Miklos, GLG
;
Nelson, C
;
Broder, S
;
Clark, AG
;
Nadeau, C
;
McKusick, VA
;
Zinder, N
;
Levine, AJ
;
Roberts, RJ
;
Simon, M
;
Slayman, C
;
Hunkapiller, M
;
Bolanos, R
;
Delcher, A
;
Dew, I
;
Fasulo, D
;
Flanigan, M
;
Florea, L
;
Halpern, A
;
Hannenhalli, S
;
Kravitz, S
;
Levy, S
;
Mobarry, C
;
Reinert, K
;
Remington, K
;
Abu-Threideh, J
;
Beasley, E
.
SCIENCE,
2001, 291 (5507)
:1304-+

Venter, JC
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Adams, MD
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Myers, EW
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Li, PW
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Mural, RJ
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Sutton, GG
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Smith, HO
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Yandell, M
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Evans, CA
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Holt, RA
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Gocayne, JD
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Amanatides, P
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Ballew, RM
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Huson, DH
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Wortman, JR
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Zhang, Q
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Kodira, CD
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Zheng, XQH
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Chen, L
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Skupski, M
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Subramanian, G
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Thomas, PD
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Zhang, JH
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Miklos, GLG
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Nelson, C
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Broder, S
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Clark, AG
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Nadeau, C
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

McKusick, VA
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Zinder, N
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Levine, AJ
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Roberts, RJ
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Simon, M
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Slayman, C
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Hunkapiller, M
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Bolanos, R
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Delcher, A
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Dew, I
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Fasulo, D
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Flanigan, M
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Florea, L
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Halpern, A
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Hannenhalli, S
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Kravitz, S
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Levy, S
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Mobarry, C
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Reinert, K
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Remington, K
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Abu-Threideh, J
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA

Beasley, E
论文数: 0 引用数: 0
h-index: 0
机构: Celera Genom, Rockville, MD 20850 USA
[57]
EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates
[J].
Vilella, Albert J.
;
Severin, Jessica
;
Ureta-Vidal, Abel
;
Heng, Li
;
Durbin, Richard
;
Birney, Ewan
.
GENOME RESEARCH,
2009, 19 (02)
:327-335

Vilella, Albert J.
论文数: 0 引用数: 0
h-index: 0
机构:
EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, England

Severin, Jessica
论文数: 0 引用数: 0
h-index: 0
机构:
EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, England

Ureta-Vidal, Abel
论文数: 0 引用数: 0
h-index: 0
机构:
EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, England

Heng, Li
论文数: 0 引用数: 0
h-index: 0
机构:
Wellcome Trust Sanger Inst, Cambridge CB10 1HH, England EMBL EBI, Cambridge CB10 1SD, England

Durbin, Richard
论文数: 0 引用数: 0
h-index: 0
机构:
Wellcome Trust Sanger Inst, Cambridge CB10 1HH, England EMBL EBI, Cambridge CB10 1SD, England

Birney, Ewan
论文数: 0 引用数: 0
h-index: 0
机构:
EMBL EBI, Cambridge CB10 1SD, England EMBL EBI, Cambridge CB10 1SD, England
[58]
PlantTribes:: a gene and gene family resource for comparative genomics in plants
[J].
Wall, P. Kerr
;
Leebens-Mack, Jim
;
Mueller, Kai F.
;
Field, Dawn
;
Altman, Naomi S.
;
dePamphilis, Claude W.
.
NUCLEIC ACIDS RESEARCH,
2008, 36
:D970-D976

Wall, P. Kerr
论文数: 0 引用数: 0
h-index: 0
机构:
Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA
Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA

Leebens-Mack, Jim
论文数: 0 引用数: 0
h-index: 0
机构:
Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA
Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA
Univ Georgia, Dept Plant Biol, Athens, GA 30602 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA

Mueller, Kai F.
论文数: 0 引用数: 0
h-index: 0
机构:
Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA
Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA
Univ Bonn, Nees Inst Biodivers Plants, D-53115 Bonn, Germany Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA

Field, Dawn
论文数: 0 引用数: 0
h-index: 0
机构:
NERC, Ctr Ecol & Hydrol, Mol Evolut & Bioinformat Grp, Oxford OX1 3SR, England Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA

Altman, Naomi S.
论文数: 0 引用数: 0
h-index: 0
机构:
Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA
Penn State Univ, Dept Stat, University Pk, PA 16802 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA

dePamphilis, Claude W.
论文数: 0 引用数: 0
h-index: 0
机构:
Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA
Penn State Univ, Huck Inst Life Sci, University Pk, PA 16802 USA Penn State Univ, Inst Mol Evolutionary Genet, Dept Biol, University Pk, PA 16802 USA
[59]
Large scale clustering of protein sequences with FORCE - A layout based heuristic for weighted cluster editing
[J].
Wittkop, Tobias
;
Baumbach, Jan
;
Lobo, Francisco P.
;
Rahmann, Sven
.
BMC BIOINFORMATICS,
2007, 8 (1)

Wittkop, Tobias
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Bielefeld, Bielefeld, Germany
Univ Bielefeld, DFG Graduiertenkolleg Bioinformat, Bielefeld, Germany Univ Bielefeld, Bielefeld, Germany

Baumbach, Jan
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Bielefeld, Bielefeld, Germany
Ctr Biotechnol, Int Grad Sch Bioinformat & Genome Res, Bielefeld, Germany Univ Bielefeld, Bielefeld, Germany

Lobo, Francisco P.
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Bielefeld, Bielefeld, Germany
Univ Fed Minas Gerais, Lab Genet Bioquim, Belo Horizonte, MG, Brazil Univ Bielefeld, Bielefeld, Germany

Rahmann, Sven
论文数: 0 引用数: 0
h-index: 0
机构:
Univ Dortmund, D-44221 Dortmund, Germany Univ Bielefeld, Bielefeld, Germany
[60]
Protein family classification and functional annotation
[J].
Wu, CH
;
Huang, HZ
;
Yeh, LSL
;
Barker, WC
.
COMPUTATIONAL BIOLOGY AND CHEMISTRY,
2003, 27 (01)
:37-47

Wu, CH
论文数: 0 引用数: 0
h-index: 0
机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USA

Huang, HZ
论文数: 0 引用数: 0
h-index: 0
机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USA

Yeh, LSL
论文数: 0 引用数: 0
h-index: 0
机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USA

Barker, WC
论文数: 0 引用数: 0
h-index: 0
机构: Georgetown Univ, Med Ctr, Washington, DC 20057 USA