Mining GO Annotations for Improving Annotation Consistency

被引:29
作者
Faria, Daniel [1 ]
Schlicker, Andreas [2 ,3 ]
Pesquita, Catia [1 ]
Bastos, Hugo [1 ]
Ferreira, Antonio E. N. [4 ]
Albrecht, Mario [3 ,5 ]
Falcao, Andre O. [1 ]
机构
[1] Univ Lisbon, Fac Sci, Dept Informat, P-1699 Lisbon, Portugal
[2] Netherlands Canc Inst, Div Mol Carcinogenesis, Amsterdam, Netherlands
[3] Max Planck Inst Informat, Dept Computat Biol & Appl Algorithm, Saarbrucken, Germany
[4] Univ Lisbon, Fac Sci, Dept Chem & Biochem, Ctr Chem & Biochem, P-1699 Lisbon, Portugal
[5] Univ Med Greifswald, Inst Biometr & Med Informat, Dept Bioinformat, Greifswald, Germany
关键词
DATABASE;
D O I
10.1371/journal.pone.0040519
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Despite the structure and objectivity provided by the Gene Ontology (GO), the annotation of proteins is a complex task that is subject to errors and inconsistencies. Electronically inferred annotations in particular are widely considered unreliable. However, given that manual curation of all GO annotations is unfeasible, it is imperative to improve the quality of electronically inferred annotations. In this work, we analyze the full GO molecular function annotation of UniProtKB proteins, and discuss some of the issues that affect their quality, focusing particularly on the lack of annotation consistency. Based on our analysis, we estimate that 64% of the UniProtKB proteins are incompletely annotated, and that inconsistent annotations affect 83% of the protein functions and at least 23% of the proteins. Additionally, we present and evaluate a data mining algorithm, based on the association rule learning methodology, for identifying implicit relationships between molecular function terms. The goal of this algorithm is to assist GO curators in updating GO and correcting and preventing inconsistent annotations. Our algorithm predicted 501 relationships with an estimated precision of 94%, whereas the basic association rule learning methodology predicted 12,352 relationships with a precision below 9%.
引用
收藏
页数:7
相关论文
共 21 条
[1]  
Agrawal R., 1993, SIGMOD Record, V22, P207, DOI 10.1145/170036.170072
[2]   Ontology engineering [J].
Alterovitz, Gil ;
Xiang, Michael ;
Hill, David P. ;
Lomax, Jane ;
Liu, Jonathan ;
Cherkassky, Michael ;
Dreyfuss, Jonathan ;
Mungall, Chris ;
Harris, Midori A. ;
Dolan, Mary E. ;
Blake, Judith A. ;
Ramoni, Marco F. .
NATURE BIOTECHNOLOGY, 2010, 28 (02) :128-130
[3]   The Universal Protein Resource (UniProt) in 2010 [J].
Apweiler, Rolf ;
Martin, Maria Jesus ;
O'Donovan, Claire ;
Magrane, Michele ;
Alam-Faruque, Yasmin ;
Antunes, Ricardo ;
Barrell, Daniel ;
Bely, Benoit ;
Bingley, Mark ;
Binns, David ;
Bower, Lawrence ;
Browne, Paul ;
Chan, Wei Mun ;
Dimmer, Emily ;
Eberhardt, Ruth ;
Fedotov, Alexander ;
Foulger, Rebecca ;
Garavelli, John ;
Huntley, Rachael ;
Jacobsen, Julius ;
Kleen, Michael ;
Laiho, Kati ;
Leinonen, Rasko ;
Legge, Duncan ;
Lin, Quan ;
Liu, Wudong ;
Luo, Jie ;
Orchard, Sandra ;
Patient, Samuel ;
Poggioli, Diego ;
Pruess, Manuela ;
Corbett, Matt ;
di Martino, Giuseppe ;
Donnelly, Mike ;
van Rensburg, Pieter ;
Bairoch, Amos ;
Bougueleret, Lydie ;
Xenarios, Ioannis ;
Altairac, Severine ;
Auchincloss, Andrea ;
Argoud-Puy, Ghislaine ;
Axelsen, Kristian ;
Baratin, Delphine ;
Blatter, Marie-Claude ;
Boeckmann, Brigitte ;
Bolleman, Jerven ;
Bollondi, Laurent ;
Boutet, Emmanuel ;
Quintaje, Silvia Braconi ;
Breuza, Lionel .
NUCLEIC ACIDS RESEARCH, 2010, 38 :D142-D148
[4]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[5]  
Bada N, 2004, SIGMOD REC, V33, P27, DOI 10.1145/1024694.1024699
[6]   The ENZYME database in 2000 [J].
Bairoch, A .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :304-305
[7]   The GOA database in 2009-an integrated Gene Ontology Annotation resource [J].
Barrell, Daniel ;
Dimmer, Emily ;
Huntley, Rachael P. ;
Binns, David ;
O'Donovan, Claire ;
Apweiler, Rolf .
NUCLEIC ACIDS RESEARCH, 2009, 37 :D396-D403
[8]  
Bodenreider O, 2005, PACIFIC SYMPOSIUM ON BIOCOMPUTING 2005, P91
[9]  
Camon EB, 2005, BMC BIOINFORMATICS, V6, DOI 10.1186/1471-2105-6-S1-S17
[10]   Intrinsic errors in genome annotation [J].
Devos, D ;
Valencia, A .
TRENDS IN GENETICS, 2001, 17 (08) :429-431