The Bologna Annotation Resource: a Non Hierarchical Method for the Functional and Structural Annotation of Protein Sequences Relying on a Comparative Large-Scale Genome Analysis

被引:10
作者
Bartoli, Lisa [1 ]
Montanucci, Ludovica [1 ]
Fronza, Raffaele [1 ]
Martelli, Pier Luigi [1 ]
Fariselli, Piero [1 ]
Carota, Luciana [2 ]
Donvito, Giacinto [3 ]
Maggi, Giorgio P. [3 ,4 ]
Casadio, Rita [1 ]
机构
[1] Univ Bologna, Dept Biol, CIRB, Biocomp Grp, I-40126 Bologna, Italy
[2] Natl Inst Nucl Phys, CNAF INFN, Bologna, Italy
[3] Natl Inst Nucl Phys, BA INFN, Bari, Italy
[4] Politecn Bari, Dept Phys, Bari, Italy
关键词
protein functional annotation; cross-genome comparison; alignment coverage; Grid technology; CLASSIFICATION; DATABASE; ALIGNMENT; DIVERGENCE; TREMBL; REFSEQ;
D O I
10.1021/pr900204r
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein sequence annotation is a major challenge in the postgenomic era. Thanks to the availability of complete genomes and proteomes, protein annotation has recently taken invaluable advantage from cross-genome comparisons. In this work, we describe a new non hierarchical clustering procedure characterized by a stringent metric which ensures a reliable transfer of function between related proteins even in the case of multidomain and distantly related proteins. The method takes advantage of the comparative analysis of 599 completely sequenced genomes, both from prokaryotes and eukaryotes, and of a GO and PDB/SCOP mapping over the clusters. A statistical validation of our method demonstrates that our clustering technique captures the essential information shared between homologous and distantly related protein sequences. By this, uncharacterized proteins can be safely annotated by inheriting the annotation of the cluster. We validate our method by blindly annotating other 201 genomes and finally we develop BAR (the Bologna Annotation Resource), a prediction server for protein functional annotation based on a total of 800 genomes (publicly available at http://microserf.biocomp.unibo.it/bar/).
引用
收藏
页码:4362 / 4371
页数:10
相关论文
共 35 条
  • [1] BASIC LOCAL ALIGNMENT SEARCH TOOL
    ALTSCHUL, SF
    GISH, W
    MILLER, W
    MYERS, EW
    LIPMAN, DJ
    [J]. JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) : 403 - 410
  • [2] [Anonymous], 2009, Clustering
  • [3] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [4] The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000
    Bairoch, A
    Apweiler, R
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 45 - 48
  • [5] Hierarchical multi-label prediction of gene function
    Barutcuoglu, Z
    Schapire, RE
    Troyanskaya, OG
    [J]. BIOINFORMATICS, 2006, 22 (07) : 830 - 836
  • [6] The Protein Data Bank
    Berman, HM
    Westbrook, J
    Feng, Z
    Gilliland, G
    Bhat, TN
    Weissig, H
    Shindyalov, IN
    Bourne, PE
    [J]. NUCLEIC ACIDS RESEARCH, 2000, 28 (01) : 235 - 242
  • [7] THE RELATION BETWEEN THE DIVERGENCE OF SEQUENCE AND STRUCTURE IN PROTEINS
    CHOTHIA, C
    LESK, AM
    [J]. EMBO JOURNAL, 1986, 5 (04) : 823 - 826
  • [8] Cormen T.H., 2001, Introduction To Algorithms, Vsecond
  • [9] An efficient algorithm for large-scale detection of protein families
    Enright, AJ
    Van Dongen, S
    Ouzounis, CA
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (07) : 1575 - 1584
  • [10] The CATH domain structure database: new protocols and classification levels give a more comprehensive resource for exploring evolution
    Greene, Lesley H.
    Lewis, Tony E.
    Addou, Sarah
    Cuff, Alison
    Dallman, Tim
    Dibley, Mark
    Redfern, Oliver
    Pearl, Frances
    Nambudiry, Rekha
    Reid, Adam
    Sillitoe, Ian
    Yeats, Corin
    Thornton, Janet M.
    Orengo, Christine A.
    [J]. NUCLEIC ACIDS RESEARCH, 2007, 35 : D291 - D297