Letter to the Editor: On the term 'interaction' and related phrases in the literature on Random Forests

被引:49
作者
Boulesteix, Anne-Laure [1 ]
Janitza, Silke [2 ]
Hapfelmeier, Alexander [3 ]
Van Steen, Kristel [4 ]
Strobl, Carolin [5 ]
机构
[1] Univ Munich, Computat Mol Med, D-81377 Munich, Germany
[2] Univ Munich, D-81377 Munich, Germany
[3] Tech Univ Munich, Inst Med Stat & Epidemiol, D-80290 Munich, Germany
[4] Univ Liege, Inst Montefiore, B-4000 Liege, Belgium
[5] Univ Zurich, CH-8006 Zurich, Switzerland
关键词
random forest; statistics; interaction; correlation; conditional inference trees; conditional variable importance; VARIABLE IMPORTANCE MEASURES; CLASSIFICATION; PREDICTORS; REGRESSION;
D O I
10.1093/bib/bbu012
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In an interesting and quite exhaustive review on Random Forests (RF) methodology in bioinformatics Touw et al. address-among other topics-the problem of the detection of interactions between variables based on RF methodology. We feel that some important statistical concepts, such as 'interaction', 'conditional dependence' or 'correlation', are sometimes employed inconsistently in the bioinformatics literature in general and in the literature on RF in particular. In this letter to the Editor, we aim to clarify some of the central statistical concepts and point out some confusing interpretations concerning RF given by Touw et al. and other authors.
引用
收藏
页码:338 / 345
页数:8
相关论文
共 32 条
[1]  
[Anonymous], 2012, Regression for categorical data
[2]  
[Anonymous], 2011, HDB DRIVING SIMULATI
[3]   Criticality of predictors in multiple regression [J].
Azen, R ;
Budescu, DV ;
Reiser, B .
BRITISH JOURNAL OF MATHEMATICAL & STATISTICAL PSYCHOLOGY, 2001, 54 :201-225
[4]   Overview of random forest methodology and practical guidance with emphasis on computational biology and bioinformatics [J].
Boulesteix, Anne-Laure ;
Janitza, Silke ;
Kruppa, Jochen ;
Koenig, Inke R. .
WILEY INTERDISCIPLINARY REVIEWS-DATA MINING AND KNOWLEDGE DISCOVERY, 2012, 2 (06) :493-507
[5]   Random forest Gini importance favours SNPs with large minor allele frequency: impact, sources and recommendations [J].
Boulesteix, Anne-Laure ;
Bender, Andreas ;
Bermejo, Justo Lorenzo ;
Strobl, Carolin .
BRIEFINGS IN BIOINFORMATICS, 2012, 13 (03) :292-304
[6]  
Breiman L, 2004, RANDOM FORESTS ORIGI
[7]  
Breiman L., 1984, CLASSIFICATION REGRE
[8]   Pathway hunting by random survival forests [J].
Chen, Xi ;
Ishwaran, Hemant .
BIOINFORMATICS, 2013, 29 (01) :99-105
[9]   Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans [J].
Cordell, HJ .
HUMAN MOLECULAR GENETICS, 2002, 11 (20) :2463-2468
[10]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)