The role of the COG database in comparative and functional genomics

被引:15
作者
Kaufmann, Michael [1 ]
机构
[1] Univ Witten Herdecke, Inst Neurobiochem, Prot Chem Grp, D-58448 Witten, Germany
关键词
COG database; cluster of orthologous groups; orthologs; comparative genomics;
D O I
10.2174/157489306777828017
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
A major breakthrough in classifying proteins from different microbial genomes in terms of sequence similarity was the development of the COG concept by Tatusov et al. in 1997. The authors defined clusters of orthologous groups of proteins (COGs) by strictly applying all against all BLAST alignments of protein sequences from completely sequenced microbial genomes. The latest update of the COG database already covered 66 microbial genomes and additionally included the KOG database, an equivalent consisting of seven eukaryotic genomes. Although excellent web-based software tools designed to analyze this huge amount of data were initially provided by the authors, many other groups independently developed more specialized or extended programs making use of COG data for diverse purposes. Here a brief introduction is given to the concept behind COGs and their potentials in the field of comparative and functional genomics are discussed. The review then is focused on the multitude of recently developed web services aimed at mining the COG database. Their capabilities to solve diverse problems in biochemistry are addressed. In order to illustrate the broad field of possible applications, a compilation of recently published findings, implementing information derived from comparative genomics with emphasis on data retrieved from the COG database, is given.
引用
收藏
页码:291 / 300
页数:10
相关论文
共 114 条
[1]   Clustering of proximal sequence space for the identification of protein families [J].
Abascal, F ;
Valencia, A .
BIOINFORMATICS, 2002, 18 (07) :908-921
[2]   XML, bioinformatics and data integration [J].
Achard, F ;
Vaysseix, G ;
Barillot, E .
BIOINFORMATICS, 2001, 17 (02) :115-125
[3]  
Almeida Luiz G P, 2004, Genet Mol Res, V3, P26
[4]   PathwayVoyager: pathway mapping using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database [J].
Altermann, E ;
Klaenhammer, TR .
BMC GENOMICS, 2005, 6 (1)
[5]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[6]   Reverse gyrase is not a prerequisite for hyperthermophilic life [J].
Atomi, H ;
Matsumi, R ;
Imanaka, T .
JOURNAL OF BACTERIOLOGY, 2004, 186 (14) :4829-4833
[7]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[8]   The complete genome sequence of Escherichia coli K-12 [J].
Blattner, FR ;
Plunkett, G ;
Bloch, CA ;
Perna, NT ;
Burland, V ;
Riley, M ;
ColladoVides, J ;
Glasner, JD ;
Rode, CK ;
Mayhew, GF ;
Gregor, J ;
Davis, NW ;
Kirkpatrick, HA ;
Goeden, MA ;
Rose, DJ ;
Mau, B ;
Shao, Y .
SCIENCE, 1997, 277 (5331) :1453-+
[9]   Prolinks: a database of protein functional linkages derived from coevolution [J].
Bowers, PM ;
Pellegrini, M ;
Thompson, MJ ;
Fierro, J ;
Yeates, TO ;
Eisenberg, D .
GENOME BIOLOGY, 2004, 5 (05)
[10]   Complete genome sequence of the methanogenic archaeon, Methanococcus jannaschii [J].
Bult, CJ ;
White, O ;
Olsen, GJ ;
Zhou, LX ;
Fleischmann, RD ;
Sutton, GG ;
Blake, JA ;
FitzGerald, LM ;
Clayton, RA ;
Gocayne, JD ;
Kerlavage, AR ;
Dougherty, BA ;
Tomb, JF ;
Adams, MD ;
Reich, CI ;
Overbeek, R ;
Kirkness, EF ;
Weinstock, KG ;
Merrick, JM ;
Glodek, A ;
Scott, JL ;
Geoghagen, NSM ;
Weidman, JF ;
Fuhrmann, JL ;
Nguyen, D ;
Utterback, TR ;
Kelley, JM ;
Peterson, JD ;
Sadow, PW ;
Hanna, MC ;
Cotton, MD ;
Roberts, KM ;
Hurst, MA ;
Kaine, BP ;
Borodovsky, M ;
Klenk, HP ;
Fraser, CM ;
Smith, HO ;
Woese, CR ;
Venter, JC .
SCIENCE, 1996, 273 (5278) :1058-1073