NIMEFI: Gene Regulatory Network Inference using Multiple Ensemble Feature Importance Algorithms

被引:51
作者
Ruyssinck, Joeri [1 ]
Van Anh Huynh-Thu [2 ,3 ,4 ]
Geurts, Pierre [2 ,3 ]
Dhaene, Tom [1 ]
Demeester, Piet [1 ]
Saeys, Yvan [5 ,6 ]
机构
[1] Ghent Univ iMinds, Dept Informat Technol, Ghent, Belgium
[2] Univ Liege, Dept Elect Engn & Comp Sci, Liege, Belgium
[3] Univ Liege, GIGA R, Liege, Belgium
[4] Univ Edinburgh, Sch Informat, Edinburgh, Midlothian, Scotland
[5] Univ Ghent VIB, Immunoregulat Lab, Inflammat Res Ctr, Ghent, Belgium
[6] Univ Ghent, Dept Resp Med, B-9000 Ghent, Belgium
关键词
SELECTION;
D O I
10.1371/journal.pone.0092709
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
One of the long-standing open challenges in computational systems biology is the topology inference of gene regulatory networks from high-throughput omics data. Recently, two community-wide efforts, DREAM4 and DREAM5, have been established to benchmark network inference techniques using gene expression measurements. In these challenges the overall top performer was the GENIE3 algorithm. This method decomposes the network inference task into separate regression problems for each gene in the network in which the expression values of a particular target gene are predicted using all other genes as possible predictors. Next, using tree-based ensemble methods, an importance measure for each predictor gene is calculated with respect to the target gene and a high feature importance is considered as putative evidence of a regulatory link existing between both genes. The contribution of this work is twofold. First, we generalize the regression decomposition strategy of GENIE3 to other feature importance methods. We compare the performance of support vector regression, the elastic net, random forest regression, symbolic regression and their ensemble variants in this setting to the original GENIE3 algorithm. To create the ensemble variants, we propose a subsampling approach which allows us to cast any feature selection algorithm that produces a feature ranking into an ensemble feature importance algorithm. We demonstrate that the ensemble setting is key to the network inference task, as only ensemble variants achieve top performance. As second contribution, we explore the effect of using rankwise averaged predictions of multiple ensemble algorithms as opposed to only one. We name this approach NIMEFI (Network Inference using Multiple Ensemble Feature Importance algorithms) and show that this approach outperforms all individual methods in general, although on a specific network a single method can perform better. An implementation of NIMEFI has been made publicly available.
引用
收藏
页数:13
相关论文
共 36 条
[1]   Toward a gold standard for promoter prediction evaluation [J].
Abeel, Thomas ;
Van de Peer, Yves ;
Saeys, Yvan .
BIOINFORMATICS, 2009, 25 (12) :I313-I320
[2]   Inferring the conservative causal core of gene regulatory networks [J].
Altay, Goekmen ;
Emmert-Streib, Frank .
BMC SYSTEMS BIOLOGY, 2010, 4
[3]  
Bach F. R., 2008, P 25 INT C MACH LEAR, P33, DOI DOI 10.1145/1390156.1390161
[4]   SmcHD1, containing a structural-maintenance-of-chromosomes hinge domain, has a critical role in X inactivation [J].
Blewitt, Marnie E. ;
Gendrel, Anne-Valerie ;
Pang, Zhenyi ;
Sparrow, Duncan B. ;
Whitelaw, Nadia ;
Craig, Jeffrey M. ;
Apedaile, Anwyn ;
Hilton, Douglas J. ;
Dunwoodie, Sally L. ;
Brockdorff, Neil ;
Kay, Graham F. ;
Whitelaw, Emma .
NATURE GENETICS, 2008, 40 (05) :663-669
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Butte A J, 2000, Pac Symp Biocomput, P418
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]   Bagging Statistical Network Inference from Large-Scale Gene Expression Data [J].
Simoes, Ricardo de Matos ;
Emmert-Streib, Frank .
PLOS ONE, 2012, 7 (03)
[9]   Advantages and limitations of current network inference methods [J].
De Smet, Riet ;
Marchal, Kathleen .
NATURE REVIEWS MICROBIOLOGY, 2010, 8 (10) :717-729
[10]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528