Practical aspects of gene regulatory inference via conditional inference forests from expression data

被引:1
作者
Bessonov, Kyrylo [1 ]
Van Steen, Kristel [1 ]
机构
[1] Univ Liege, Med Genom, GIGA R, Ave Hop 1, B-4000 Sart Tilman Par Liege, Belgium
关键词
biological interactions; conditional inference forests; gene regulatory networks; VARIABLE IMPORTANCE; RECONSTRUCTION; PRINCIPLES; BIOLOGY; PHRASES;
D O I
10.1002/gepi.22017
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Gene regulatory network (GRN) inference is an active area of research that facilitates understanding the complex interplays between biological molecules. We propose a novel framework to create such GRNs, based on Conditional Inference Forests (CIFs) as proposed by Strobl et al. Our framework consists of using ensembles of Conditional Inference Trees (CITs) and selecting an appropriate aggregation scheme for variant selection prior to network construction. We show on synthetic microarray data that taking the original implementation of CIFs with conditional permutation scheme (CIFcond) may lead to improved performance compared to Breiman's implementation of Random Forests (RF). Among all newly introduced CIF-based methods and five network scenarios obtained from the DREAM4 challenge, CIFcond performed best. Networks derived from well-tuned CIFs, obtained by simply averaging P-values over tree ensembles (CIFmean) are particularly attractive, because they combine adequate performance with computational efficiency. Moreover, thresholds for variable selection are based on significance levels for P-values and, hence, do not need to be tuned. From a practical point of view, our extensive simulations show the potential advantages of CIFmean-based methods. Although more work is needed to improve on speed, especially when fully exploiting the advantages of CITs in the context of heterogeneous and correlated data, we have shown that CIF methodology can be flexibly inserted in a framework to infer biological interactions. Notably, we confirmed biologically relevant interaction between IL2RA and FOXP1, linked to the IL-2 signaling pathway and to type 1 diabetes.
引用
收藏
页码:767 / 778
页数:12
相关论文
共 44 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]  
[Anonymous], 2006, 23 INT C MACH LEARN, DOI [10.1145/1143844.1143874, DOI 10.1145/1143844.1143874]
[3]   An Integrative Approach to Inferring Gene Regulatory Module Networks [J].
Baitaluk, Michael ;
Kozhenkov, Sergey ;
Ponomarenko, Julia .
PLOS ONE, 2012, 7 (12)
[4]   Letter to the Editor: On the term 'interaction' and related phrases in the literature on Random Forests [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Hapfelmeier, Alexander ;
Van Steen, Kristel ;
Strobl, Carolin .
BRIEFINGS IN BIOINFORMATICS, 2015, 16 (02) :338-345
[5]   Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Kruppa, Jochen ;
Koenig, Inke R. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :493-507
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Chapter 5: Network Biology Approach to Complex Diseases [J].
Cho, Dong-Yeon ;
Kim, Yoo-Ah ;
Przytycka, Teresa M. .
PLOS COMPUTATIONAL BIOLOGY, 2012, 8 (12)
[8]   Gene regulatory networks [J].
Davidson, E ;
Levine, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (14) :4935-4935
[9]   Transcription factor regulation can be accurately predicted from the presence of target gene signatures in microarray gene expression data [J].
Essaghir, Ahmed ;
Toffalini, Federica ;
Knoops, Laurent ;
Kallin, Anders ;
van Helden, Jacques ;
Demoulin, Jean-Baptiste .
NUCLEIC ACIDS RESEARCH, 2010, 38 (11) :e120-e120
[10]   Microarray platforms - comparisons and contrasts [J].
Hardiman, G .
PHARMACOGENOMICS, 2004, 5 (05) :487-502