Letter to the Editor: On the term 'interaction' and related phrases in the literature on Random Forests

被引:49
作者
Boulesteix, Anne-Laure [1 ]
Janitza, Silke [2 ]
Hapfelmeier, Alexander [3 ]
Van Steen, Kristel [4 ]
Strobl, Carolin [5 ]
机构
[1] Univ Munich, Computat Mol Med, D-81377 Munich, Germany
[2] Univ Munich, D-81377 Munich, Germany
[3] Tech Univ Munich, Inst Med Stat & Epidemiol, D-80290 Munich, Germany
[4] Univ Liege, Inst Montefiore, B-4000 Liege, Belgium
[5] Univ Zurich, CH-8006 Zurich, Switzerland
关键词
random forest; statistics; interaction; correlation; conditional inference trees; conditional variable importance; VARIABLE IMPORTANCE MEASURES; CLASSIFICATION; PREDICTORS; REGRESSION;
D O I
10.1093/bib/bbu012
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
In an interesting and quite exhaustive review on Random Forests (RF) methodology in bioinformatics Touw et al. address-among other topics-the problem of the detection of interactions between variables based on RF methodology. We feel that some important statistical concepts, such as 'interaction', 'conditional dependence' or 'correlation', are sometimes employed inconsistently in the bioinformatics literature in general and in the literature on RF in particular. In this letter to the Editor, we aim to clarify some of the central statistical concepts and point out some confusing interpretations concerning RF given by Touw et al. and other authors.
引用
收藏
页码:338 / 345
页数:8
相关论文
共 32 条
[21]  
Miettinen OS., 1985, Theoretical Epidemiology: Principles of Occurrence Research in Medicine
[22]  
Moore HA, 2005, NAT GENET, V37, P13
[23]   Letter to the Editor: On the stability and ranking of predictors from random forest variable importance measures [J].
Nicodemus, Kristin K. .
BRIEFINGS IN BIOINFORMATICS, 2011, 12 (04) :369-373
[24]   CONCEPTS OF INTERACTION [J].
ROTHMAN, KJ ;
GREENLAND, S ;
WALKER, AM .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1980, 112 (04) :467-470
[25]   Binding Profiles of Chromatin-Modifying Proteins Are Predictive for Transcriptional Activity and Promoter-Proximal Pausing [J].
Sakoparnig, Thomas ;
Kockmann, Tobias ;
Paro, Renato ;
Beisel, Christian ;
Beerenwinkel, Niko .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2012, 19 (02) :126-138
[26]   Genome-wide analysis of A-to-I RNA editing by single-molecule sequencing in Drosophila [J].
St Laurent, Georges ;
Tackett, Michael R. ;
Nechkin, Sergey ;
Shtokalo, Dmitry ;
Antonets, Denis ;
Savva, Yiannis A. ;
Maloney, Rachel ;
Kapranov, Philipp ;
Lawrence, Charles E. ;
Reenan, Robert A. .
NATURE STRUCTURAL & MOLECULAR BIOLOGY, 2013, 20 (11) :1333-U141
[27]   A comprehensive comparison of random forests and support vector machines for microarray-based cancer classification [J].
Statnikov, Alexander ;
Wang, Lily ;
Aliferis, Constantin F. .
BMC BIOINFORMATICS, 2008, 9 (1)
[28]   Bias in random forest variable importance measures: Illustrations, sources and a solution [J].
Strobl, Carolin ;
Boulesteix, Anne-Laure ;
Zeileis, Achim ;
Hothorn, Torsten .
BMC BIOINFORMATICS, 2007, 8 (1)
[29]   An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests [J].
Strobl, Carolin ;
Malley, James ;
Tutz, Gerhard .
PSYCHOLOGICAL METHODS, 2009, 14 (04) :323-348
[30]   Conditional variable importance for random forests [J].
Strobl, Carolin ;
Boulesteix, Anne-Laure ;
Kneib, Thomas ;
Augustin, Thomas ;
Zeileis, Achim .
BMC BIOINFORMATICS, 2008, 9 (1)