Using PPI network autocorrelation in hierarchical multi-label classification trees for gene function prediction

被引:42
作者
Stojanova, Daniela [1 ,2 ]
Ceci, Michelangelo [3 ]
Malerba, Donato [3 ]
Dzeroski, Saso [1 ,2 ,4 ]
机构
[1] Jozef Stefan Inst, Dept Knowledge Technol, Ljubljana, Slovenia
[2] Jozef Stefan Int Postgrad Sch, Ljubljana 1000, Slovenia
[3] Univ Bari Aldo Moro, Dipartimento Informat, Bari, Italy
[4] Ctr Excellence Integrated Approaches Chem & Biol, Ljubljana 1000, Slovenia
来源
BMC BIOINFORMATICS | 2013年 / 14卷
关键词
PROTEIN FUNCTION PREDICTION; SPATIAL AUTOCORRELATION; ONTOLOGY; ANNOTATION; DATABASE;
D O I
10.1186/1471-2105-14-285
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Ontologies and catalogs of gene functions, such as the Gene Ontology (GO) and MIPS-FUN, assume that functional classes are organized hierarchically, that is, general functions include more specific ones. This has recently motivated the development of several machine learning algorithms for gene function prediction that leverages on this hierarchical organization where instances may belong to multiple classes. In addition, it is possible to exploit relationships among examples, since it is plausible that related genes tend to share functional annotations. Although these relationships have been identified and extensively studied in the area of protein-protein interaction (PPI) networks, they have not received much attention in hierarchical and multi-class gene function prediction. Relations between genes introduce autocorrelation in functional annotations and violate the assumption that instances are independently and identically distributed (i.i.d.), which underlines most machine learning algorithms. Although the explicit consideration of these relations brings additional complexity to the learning process, we expect substantial benefits in predictive accuracy of learned classifiers. Results: This article demonstrates the benefits (in terms of predictive accuracy) of considering autocorrelation in multi-class gene function prediction. We develop a tree-based algorithm for considering network autocorrelation in the setting of Hierarchical Multi-label Classification (HMC). We empirically evaluate the proposed algorithm, called NHMC (Network Hierarchical Multi-label Classification), on 12 yeast datasets using each of the MIPS-FUN and GO annotation schemes and exploiting 2 different PPI networks. The results clearly show that taking autocorrelation into account improves the predictive performance of the learned models for predicting gene function. Conclusions: Our newly developed method for HMC takes into account network information in the learning phase: When used for gene function prediction in the context of PPI networks, the explicit consideration of network autocorrelation increases the predictive performance of the learned models. Overall, we found that this holds for different gene features/descriptions, functional annotation schemes, and PPI networks: Best results are achieved when the PPI network is dense and contains a large proportion of function-relevant interactions.
引用
收藏
页数:18
相关论文
共 51 条
[31]   Learning gene functional classifications from multiple data types [J].
Pavlidis, P ;
Weston, J ;
Cai, JS ;
Noble, WS .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (02) :401-411
[32]  
Qi Y., 2011, Handbook of Computational Statistics: Statistical Bioinformatics
[33]  
Quinlan J.R., 1993, C4 5 PROGRAMS MACHIN
[34]  
Radivojac P, 2013, NAT METHODS, V10, P221, DOI [10.1038/NMETH.2340, 10.1038/nmeth.2340]
[35]   The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes [J].
Ruepp, A ;
Zollner, A ;
Maier, D ;
Albermann, K ;
Hani, J ;
Mokrejs, M ;
Tetko, I ;
Güldener, U ;
Mannhaupt, G ;
Münsterkötter, M ;
Mewes, HW .
NUCLEIC ACIDS RESEARCH, 2004, 32 (18) :5539-5545
[36]   Predicting gene function using hierarchical multi-label decision tree ensembles [J].
Schietgat, Leander ;
Vens, Celine ;
Struyf, Jan ;
Blockeel, Hendrik ;
Kocev, Dragi ;
Dzeroski, Saso .
BMC BIOINFORMATICS, 2010, 11
[37]   Gene function classification using Bayesian models with hierarchy-based priors [J].
Shahbaba, Babak ;
Neal, Radford M. .
BMC BIOINFORMATICS, 2006, 7 (1)
[38]   Network-based prediction of protein function [J].
Sharan, Roded ;
Ulitsky, Igor ;
Shamir, Ron .
MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1) :1-13
[39]   Protein complex detection with semi-supervised learning in protein interaction networks [J].
Shi, Lei ;
Lei, Xiujuan ;
Zhang, Aidong .
PROTEOME SCIENCE, 2011, 9
[40]  
Sokolov Artem, 2010, Journal of Bioinformatics and Computational Biology, V8, P357, DOI 10.1142/S0219720010004744