Knowledge-based voting algorithm for automated protein functional annotation

被引:7
作者
Yu, GX
Glass, EM
Karonis, GT
Maltsev, N
机构
[1] Argonne Natl Lab, Div Math & Comp Sci, Argonne, IL 60439 USA
[2] No Illinois Univ, Dept Comp Sci, De Kalb, IL 60115 USA
关键词
protein function prediction; knowledge system; protein function groups; rules; voting procedure; alternative functional assignments;
D O I
10.1002/prot.20652
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Automated annotation of high-throughput genome sequences is one of the earliest steps toward a comprehensive understanding of the dynamic behavior of living organisms. However, the step is often error-prone because of its underlying algorithms, which rely mainly on a simple similarity analysis, and lack of guidance from biological rules. We present herein a knowledge-based protein annotation algorithm. Our objectives are to reduce errors and to improve annotation confidences. This algorithm consists of two major components: a knowledge system, called "RuleMiner," and a voting procedure. The knowledge system, which includes biological rules and functional profiles for each function, provides a platform for seamless integration of multiple sequence analysis tools and guidance for function annotation. The voting procedure, which relies on the knowledge system, is designed to make (possibly) unbiased judgments in functional assignments among complicated, sometimes conflicting, information. We have applied this algorithm to 10 prokaryotic bacterial genomes and observed a significant improvement in annotation confidences. We also discuss the current limitations of the algorithm and the potential for future improvement.
引用
收藏
页码:907 / 917
页数:11
相关论文
共 25 条
[1]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[2]   Automated genome sequence analysis and annotation [J].
Andrade, MA ;
Brown, NP ;
Leroy, C ;
Hoersch, S ;
de Daruvar, A ;
Reich, C ;
Franchini, A ;
Tamames, J ;
Valencia, A ;
Ouzounis, C ;
Sander, C .
BIOINFORMATICS, 1999, 15 (05) :391-412
[3]   The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000 [J].
Bairoch, A ;
Apweiler, R .
NUCLEIC ACIDS RESEARCH, 2000, 28 (01) :45-48
[4]  
Bateman A, 2004, NUCLEIC ACIDS RES, V32, pD138, DOI [10.1093/nar/gkp985, 10.1093/nar/gkr1065, 10.1093/nar/gkh121]
[5]   Genomes OnLine Database (GOLD): a monitor of genome projects world-wide [J].
Bernal, A ;
Ear, U ;
Kyrpides, N .
NUCLEIC ACIDS RESEARCH, 2001, 29 (01) :126-127
[6]   The gene ontology annotation (GOA) project: Implementation of GO in SWISS-PROT, TrEMBL, and InterPro [J].
Camon, E ;
Magrane, M ;
Barrell, D ;
Binns, D ;
Fleischmann, W ;
Kersey, P ;
Mulder, N ;
Oinn, T ;
Maslen, J ;
Cox, A ;
Apweiler, R .
GENOME RESEARCH, 2003, 13 (04) :662-672
[7]   Synthesis of catalytically active form III ribulose 1,5-bisphosphate carboxylase/oxygenase in archaea [J].
Finn, MW ;
Tabita, FR .
JOURNAL OF BACTERIOLOGY, 2003, 185 (10) :3049-3059
[8]   PEDANTic genome analysis [J].
Frishman, D ;
Mewes, HW .
TRENDS IN GENETICS, 1997, 13 (10) :415-416
[9]   Fully automated genome analysis that reflects user needs and preferences. A detailed introduction to the MAGPIE system architecture [J].
Gaasterland, T ;
Sensen, CW .
BIOCHIMIE, 1996, 78 (05) :302-310
[10]  
Galperin M Y, 1998, In Silico Biol, V1, P55