Combining learning and constraints for genome-wide protein annotation

被引:2
作者
Teso, Stefano [1 ]
Masera, Luca [2 ]
Diligenti, Michelangelo [3 ]
Passerini, Andrea [2 ]
机构
[1] KULeuven, Comp Sci Dept, Celestijnenlaan 200 A Bus 2402, B-3001 Leuven, Belgium
[2] Univ Trento, Dept Informat Engn & Comp Sci, Via Sommar 5, I-38123 Povo, Italy
[3] Univ Siena, Dept Informat Engn & Math, Via Roma 56, I-53100 Siena, Italy
关键词
Protein function prediction; Protein-protein interaction; Kernel methods; Genome annotation; GENE ONTOLOGY; SEQUENCE; CLASSIFICATION; DATABASE; NETWORK; GENERATION; PREDICTION; DISCOVERY; PARADIGM;
D O I
10.1186/s12859-019-2875-5
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundThe advent of high-throughput experimental techniques paved the way to genome-wide computational analysis and predictive annotation studies. When considering the joint annotation of a large set of related entities, like all proteins of a certain genome, many candidate annotations could be inconsistent, or very unlikely, given the existing knowledge. A sound predictive framework capable of accounting for this type of constraints in making predictions could substantially contribute to the quality of machine-generated annotations at a genomic scale.ResultsWe present Ocelot, a predictive pipeline which simultaneously addresses functional and interaction annotation of all proteins of a given genome. The system combines sequence-based predictors for functional and protein-protein interaction (PPI) prediction with a consistency layer enforcing (soft) constraints as fuzzy logic rules. The enforced rules represent the available prior knowledge about the classification task, including taxonomic constraints over each GO hierarchy (e.g. a protein labeled with a GO term should also be labeled with all ancestor terms) as well as rules combining interaction and function prediction. An extensive experimental evaluation on the Yeast genome shows that the integration of prior knowledge via rules substantially improves the quality of the predictions. The system largely outperforms GoFDR, the only high-ranking system at the last CAFA challenge with a readily available implementation, when GoFDR is given access to intra-genome information only (as Ocelot), and has comparable or better results (depending on the hierarchy and performance measure) when GoFDR is allowed to use information from other genomes. Our system also compares favorably to recent methods based on deep learning.
引用
收藏
页数:14
相关论文
共 53 条
[1]   Gapped BLAST and PSI-BLAST: a new generation of protein database search programs [J].
Altschul, SF ;
Madden, TL ;
Schaffer, AA ;
Zhang, JH ;
Zhang, Z ;
Miller, W ;
Lipman, DJ .
NUCLEIC ACIDS RESEARCH, 1997, 25 (17) :3389-3402
[2]   BASIC LOCAL ALIGNMENT SEARCH TOOL [J].
ALTSCHUL, SF ;
GISH, W ;
MILLER, W ;
MYERS, EW ;
LIPMAN, DJ .
JOURNAL OF MOLECULAR BIOLOGY, 1990, 215 (03) :403-410
[3]  
[Anonymous], 2001, Learning with Kernels |
[4]  
[Anonymous], 2006, TECHNICAL REPORT 153
[5]  
[Anonymous], HDB STAT BIOINFORMAT
[6]  
[Anonymous], 2016, SCI REP
[7]   Gene Ontology: tool for the unification of biology [J].
Ashburner, M ;
Ball, CA ;
Blake, JA ;
Botstein, D ;
Butler, H ;
Cherry, JM ;
Davis, AP ;
Dolinski, K ;
Dwight, SS ;
Eppig, JT ;
Harris, MA ;
Hill, DP ;
Issel-Tarver, L ;
Kasarskis, A ;
Lewis, S ;
Matese, JC ;
Richardson, JE ;
Ringwald, M ;
Rubin, GM ;
Sherlock, G .
NATURE GENETICS, 2000, 25 (01) :25-29
[8]   Negatome 2.0: a database of non-interacting proteins derived by literature mining, manual annotation and protein structure analysis [J].
Blohm, Philipp ;
Frishman, Goar ;
Smialowski, Pawel ;
Goebels, Florian ;
Wachinger, Benedikt ;
Ruepp, Andreas ;
Frishman, Dmitrij .
NUCLEIC ACIDS RESEARCH, 2014, 42 (D1) :D396-D400
[9]   The BioGRID interaction database: 2015 update [J].
Chatr-aryamontri, Andrew ;
Breitkreutz, Bobby-Joe ;
Oughtred, Rose ;
Boucher, Lorrie ;
Heinicke, Sven ;
Chen, Daici ;
Stark, Chris ;
Breitkreutz, Ashton ;
Kolas, Nadine ;
O'Donnell, Lara ;
Reguly, Teresa ;
Nixon, Julie ;
Ramage, Lindsay ;
Winter, Andrew ;
Sellam, Adnane ;
Chang, Christie ;
Hirschman, Jodi ;
Theesfeld, Chandra ;
Rust, Jennifer ;
Livstone, Michael S. ;
Dolinski, Kara ;
Tyers, Mike .
NUCLEIC ACIDS RESEARCH, 2015, 43 (D1) :D470-D478
[10]   Saccharomyces Genome Database: the genomics resource of budding yeast [J].
Cherry, J. Michael ;
Hong, Eurie L. ;
Amundsen, Craig ;
Balakrishnan, Rama ;
Binkley, Gail ;
Chan, Esther T. ;
Christie, Karen R. ;
Costanzo, Maria C. ;
Dwight, Selina S. ;
Engel, Stacia R. ;
Fisk, Dianna G. ;
Hirschman, Jodi E. ;
Hitz, Benjamin C. ;
Karra, Kalpana ;
Krieger, Cynthia J. ;
Miyasato, Stuart R. ;
Nash, Rob S. ;
Park, Julie ;
Skrzypek, Marek S. ;
Simison, Matt ;
Weng, Shuai ;
Wong, Edith D. .
NUCLEIC ACIDS RESEARCH, 2012, 40 (D1) :D700-D705