Correct classification of genes into gene families is important for understanding gene function and evolution. Although gene families of many species have been resolved both computationally and experimentally with high accuracy, gene family classification in most newly sequenced genomes has not been done with the same high standard. This project has been designed to develop a strategy to effectively and accurately classify gene families across genomes. We first examine and compare the performance of computer programs developed for automated gene family classification. We demonstrate that some programs, including the hierarchical average-linkage clustering algorithm MC-UPGMA and the popular Markov clustering algorithm TRIBE-MCL, can reconstruct manual curation of gene families accurately. However, their performance is highly sensitive to parameter setting, i.e. different gene families require different program parameters for correct resolution. To circumvent the problem of parameterization, we have developed a comparative strategy for gene family classification. This strategy takes advantage of existing curated gene families of reference species to find suitable parameters for classifying genes in related genomes. To demonstrate the effectiveness of this novel strategy, we use TRIBE-MCL to classify chemosensory and ABC transporter gene families in C. elegans and its four sister species. We conclude that fully automated programs can establish biologically accurate gene families if parameterized accordingly. Comparative gene family classification finds optimal parameters automatically, thus allowing rapid insights into gene families of newly sequenced species.
机构:
Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
Univ Chicago, Inst Genom & Syst Biol, Chicago, IL 60637 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Barriere, Antoine
Yang, Shiaw-Pyng
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Genome Sequencing Ctr, St Louis, MO 63108 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Yang, Shiaw-Pyng
Pekarek, Elizabeth
论文数: 0引用数: 0
h-index: 0
机构:
Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
Univ Chicago, Inst Genom & Syst Biol, Chicago, IL 60637 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Pekarek, Elizabeth
Thomas, Cristel G.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Maryland, Dept Biol, College Pk, MD 20742 USA
Univ Maryland, Mol & Cell Biol Program, College Pk, MD 20742 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Thomas, Cristel G.
Haag, Eric S.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Maryland, Dept Biol, College Pk, MD 20742 USA
Univ Maryland, Mol & Cell Biol Program, College Pk, MD 20742 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Haag, Eric S.
Ruvinsky, Ilya
论文数: 0引用数: 0
h-index: 0
机构:
Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
Univ Chicago, Inst Genom & Syst Biol, Chicago, IL 60637 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
机构:
Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
Indiana Univ, Sch Informat, Bloomington, IN USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA
Demuth, Jeffery P.
论文数: 引用数:
h-index:
机构:
De Bie, Tijl
Stajich, Jason E.
论文数: 0引用数: 0
h-index: 0
机构:
Duke Univ, Dept Mol Genet & Microbiol, Durham, NC USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA
Stajich, Jason E.
Cristianini, Nello
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Davis, Dept Stat, Davis, CA 95616 USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA
Cristianini, Nello
Hahn, Matthew W.
论文数: 0引用数: 0
h-index: 0
机构:
Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
Indiana Univ, Sch Informat, Bloomington, IN USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA
机构:
Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
Univ Chicago, Inst Genom & Syst Biol, Chicago, IL 60637 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Barriere, Antoine
Yang, Shiaw-Pyng
论文数: 0引用数: 0
h-index: 0
机构:
Washington Univ, Genome Sequencing Ctr, St Louis, MO 63108 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Yang, Shiaw-Pyng
Pekarek, Elizabeth
论文数: 0引用数: 0
h-index: 0
机构:
Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
Univ Chicago, Inst Genom & Syst Biol, Chicago, IL 60637 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Pekarek, Elizabeth
Thomas, Cristel G.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Maryland, Dept Biol, College Pk, MD 20742 USA
Univ Maryland, Mol & Cell Biol Program, College Pk, MD 20742 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Thomas, Cristel G.
Haag, Eric S.
论文数: 0引用数: 0
h-index: 0
机构:
Univ Maryland, Dept Biol, College Pk, MD 20742 USA
Univ Maryland, Mol & Cell Biol Program, College Pk, MD 20742 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
Haag, Eric S.
Ruvinsky, Ilya
论文数: 0引用数: 0
h-index: 0
机构:
Univ Chicago, Dept Ecol & Evolut, Chicago, IL 60637 USA
Univ Chicago, Inst Genom & Syst Biol, Chicago, IL 60637 USAUniv Maryland, Dept Biol, College Pk, MD 20742 USA
机构:
Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
Indiana Univ, Sch Informat, Bloomington, IN USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA
Demuth, Jeffery P.
论文数: 引用数:
h-index:
机构:
De Bie, Tijl
Stajich, Jason E.
论文数: 0引用数: 0
h-index: 0
机构:
Duke Univ, Dept Mol Genet & Microbiol, Durham, NC USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA
Stajich, Jason E.
Cristianini, Nello
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Davis, Dept Stat, Davis, CA 95616 USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA
Cristianini, Nello
Hahn, Matthew W.
论文数: 0引用数: 0
h-index: 0
机构:
Indiana Univ, Dept Biol, Bloomington, IN 47405 USA
Indiana Univ, Sch Informat, Bloomington, IN USAIndiana Univ, Dept Biol, Bloomington, IN 47405 USA