Empirical characterization of random forest variable importance measures

被引:746
|
作者
Archer, Kelfie J. [1 ]
Kirnes, Ryan V. [1 ]
机构
[1] Virginia Commonwealth Univ, Dept Biostat, Richmond, VA 23298 USA
关键词
random forest; classification tree; variable importance; bootstrap aggregating;
D O I
10.1016/j.csda.2007.08.015
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Microarray studies yield data sets consisting of a large number of candidate predictors (genes) on a small number of observations (samples). When interest lies in predicting phenotypic class using gene expression data, often the goals are both to produce an accurate classifier and to uncover the predictive structure of the problem. Most machine learning methods, such as k-nearest neighbors, support vector machines, and neural networks, are useful for classification. However, these methods provide no insight regarding the covariates that best contribute to the predictive structure. Other methods, such as linear discriminant analysis, require the predictor space be substantially reduced prior to deriving the classifier. A recently developed method, random forests (RF), does not require reduction of the predictor space prior to classification. Additionally, RF yield variable importance measures for each candidate predictor. This study examined the effectiveness of RF variable importance measures in identifying the true predictor among a large number of candidate predictors. An extensive simulation study was conducted using 20 levels of correlation among the predictor variables and 7 levels of association between the true predictor and the dichotomous response. We conclude that the RF methodology is attractive for use in classification problems when the goals of the study are to produce an accurate classifier and to provide insight regarding the discriminative ability of individual predictor variables. Such goals are common among microarray studies, and therefore application of the RF methodology for the purpose of obtaining variable importance measures is demonstrated on a microarray data set.. (c) 2007 Elsevier B.V. All rights reserved.
引用
收藏
页码:2249 / 2260
页数:12
相关论文
共 50 条
  • [21] Variable Importance Assessment in Regression: Linear Regression versus Random Forest
    Groemping, Ulrike
    AMERICAN STATISTICIAN, 2009, 63 (04): : 308 - 319
  • [22] NPP estimation using random forest and impact feature variable importance analysis
    Yu, Bo
    Chen, Fang
    Chen, Hanyue
    JOURNAL OF SPATIAL SCIENCE, 2019, 64 (01) : 173 - 192
  • [23] Using a Random Forest proximity measure for variable importance stratification in genotypic data
    Seoane, Jose A.
    Day, Ian N. M.
    Campbell, Colin
    Casas, Juan P.
    Gaunt, Tom R.
    PROCEEDINGS IWBBIO 2014: INTERNATIONAL WORK-CONFERENCE ON BIOINFORMATICS AND BIOMEDICAL ENGINEERING, VOLS 1 AND 2, 2014, : 1049 - 1060
  • [24] WEAK INVARIANCE-PRINCIPLES FOR EMPIRICAL MEASURES OF A RANDOM-VARIABLE MIXING SEQUENCE
    DOUKHAN, P
    LEON, JR
    PORTAL, F
    PROBABILITY THEORY AND RELATED FIELDS, 1987, 76 (01) : 51 - 70
  • [26] Identification of influential rare variants in aggregate testing using random forest importance measures
    Blumhagen, Rachel Z.
    Schwartz, David A.
    Langefeld, Carl D.
    Fingerlin, Tasha E.
    ANNALS OF HUMAN GENETICS, 2023, 87 (04) : 184 - 195
  • [27] VARIABLE IMPORTANCE AND RANDOM FOREST CLASSIFICATION USING RADARSAT-2 POLSAR DATA
    Hariharan, Siddharth
    Tirodkar, Siddhesh
    De, Shaunak
    Bhattacharya, Avik
    2014 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS), 2014, : 1210 - 1213
  • [28] Standard errors and confidence intervals for variable importance in random forest regression, classification, and survival
    Ishwaran, Hemant
    Lu, Min
    STATISTICS IN MEDICINE, 2019, 38 (04) : 558 - 582
  • [29] Random forest and variable importance rankings for correlated survival data, with applications to tooth loss
    Hallett, M. J.
    Fan, J. J.
    Su, X. G.
    Levine, R. A.
    Nunn, M. E.
    STATISTICAL MODELLING, 2014, 14 (06) : 523 - 547
  • [30] Environmental variable importance for under-five mortality in Malaysia: A random forest approach
    Phung, Vera Ling Hui
    Oka, Kazutaka
    Hijioka, Yasuaki
    Ueda, Kayo
    Sahani, Mazrura
    Mahiyuddin, Wan Rozita Wan
    SCIENCE OF THE TOTAL ENVIRONMENT, 2022, 845