The Bologna Annotation Resource: a Non Hierarchical Method for the Functional and Structural Annotation of Protein Sequences Relying on a Comparative Large-Scale Genome Analysis

被引：10

作者：

Bartoli, Lisa ^{[1
]}

Montanucci, Ludovica ^{[1
]}

Fronza, Raffaele ^{[1
]}

Martelli, Pier Luigi ^{[1
]}

Fariselli, Piero ^{[1
]}

Carota, Luciana ^{[2
]}

Donvito, Giacinto ^{[3
]}

Maggi, Giorgio P. ^{[3
,4
]}

Casadio, Rita ^{[1
]}

机构：

[1] Univ Bologna, Dept Biol, CIRB, Biocomp Grp, I-40126 Bologna, Italy

[2] Natl Inst Nucl Phys, CNAF INFN, Bologna, Italy

[3] Natl Inst Nucl Phys, BA INFN, Bari, Italy

[4] Politecn Bari, Dept Phys, Bari, Italy

来源：

JOURNAL OF PROTEOME RESEARCH | 2009年 / 8卷 / 09期

关键词：

protein functional annotation; cross-genome comparison; alignment coverage; Grid technology; CLASSIFICATION; DATABASE; ALIGNMENT; DIVERGENCE; TREMBL; REFSEQ;

D O I：

10.1021/pr900204r

中图分类号：

Q5 [生物化学];

学科分类号：

071010 ; 081704 ;

摘要：

Protein sequence annotation is a major challenge in the postgenomic era. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. In this work, we describe a new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we develop BAR (the Bologna Annotation Resource), a prediction server for protein functional annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/).

引用

页码：4362 / 4371

页数：10

共 35 条

[1] BASIC LOCAL ALIGNMENT SEARCH TOOL
ALTSCHUL, SF
GISH, W
MILLER, W
MYERS, EW
LIPMAN, DJ
[J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
[2] [Anonymous], 2009, Clustering
[3] Gene Ontology: tool for the unification of biology
Ashburner, M
Ball, CA
Blake, JA
Botstein, D
Butler, H
Cherry, JM
Davis, AP
Dolinski, K
Dwight, SS
Eppig, JT
Harris, MA
Hill, DP
Issel-Tarver, L
Kasarskis, A
Lewis, S
Matese, JC
Richardson, JE
Ringwald, M
Rubin, GM
Sherlock, G
[J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
[4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
Bairoch, A
Apweiler, R
[J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
[5] Hierarchical multi-label prediction of gene function
Barutcuoglu, Z
Schapire, RE
Troyanskaya, OG
[J]. BIOINFORMATICS, 2006, 22 (07) : 830 - 836
[6] The Protein Data Bank
Berman, HM
Westbrook, J
Feng, Z
Gilliland, G
Bhat, TN
Weissig, H
Shindyalov, IN
Bourne, PE
[J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
[7] THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS
CHOTHIA, C
LESK, AM
[J]. EMBO JOURNAL, 1986, 5 (04) : 823 - 826
[8] Cormen T.H., 2001, Introduction To Algorithms, Vsecond
[9] An efficient algorithm for large-scale detection of protein families
Enright, AJ
Van Dongen, S
Ouzounis, CA
[J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (07) : 1575 - 1584
[10] The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution
Greene, Lesley H.
Lewis, Tony E.
Addou, Sarah
Cuff, Alison
Dallman, Tim
Dibley, Mark
Redfern, Oliver
Pearl, Frances
Nambudiry, Rekha
Reid, Adam
Sillitoe, Ian
Yeats, Corin
Thornton, Janet M.
Orengo, Christine A.
[J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D291 - D297

← 1 2 3 4 →