Dimension Reduction Forests: Local Variable Importance Using Structured Random Forests

被引:5
作者
Loyal, Joshua Daniel [1 ]
Zhu, Ruoqing [1 ]
Cui, Yifan [2 ]
Zhang, Xin [3 ]
机构
[1] Univ Illinois, Dept Stat, Champaign, IL 61820 USA
[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore, Singapore
[3] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
关键词
Random forests; Sufficient dimension reduction; Variable importance; SLICED INVERSE REGRESSION;
D O I
10.1080/10618600.2022.2069777
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator that pairs the flexible random forest kernel with local sufficient dimension reduction to adapt to a regression function's local structure. This allows us to estimate a meaningful directional local variable importance measure at each prediction point. We develop a computationally efficient fitting procedure and provide sufficient conditions for the recovery of the splitting directions. We demonstrate significant accuracy gains of our proposed estimator over competing methods on simulated and real regression problems. Finally, we apply the proposed method to seasonal particulate matter concentration data collected in Beijing, China, which yields meaningful local importance measures. The methods presented here are available in the drforest Python package. for this article are available online.
引用
收藏
页码:1104 / 1113
页数:10
相关论文
共 50 条
[41]   Classification of Linear Structures in Mammograms Using Random Forests [J].
Chen, Zezhi ;
Berks, Michael ;
Astley, Susan ;
Taylor, Chris .
DIGITAL MAMMOGRAPHY, 2010, 6136 :153-160
[42]   Disambiguating Authors in Academic Publications using Random Forests [J].
Treeratpituk, Pucktada ;
Giles, C. Lee .
JCDL 09: PROCEEDINGS OF THE 2009 ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2009, :39-48
[43]   Gradient modeling of conifer species using random forests [J].
Jeffrey S. Evans ;
Samuel A. Cushman .
Landscape Ecology, 2009, 24 :673-683
[44]   AUTOMATIC PARCELLATION OF CORTICAL SURFACES USING RANDOM FORESTS [J].
Meng, Yu ;
Li, Gang ;
Gao, Yaozong ;
Shen, Dinggang .
2015 IEEE 12TH INTERNATIONAL SYMPOSIUM ON BIOMEDICAL IMAGING (ISBI), 2015, :810-813
[45]   DIAGNOSING ASSETS IMPAIRMENT BY USING RANDOM FORESTS MODEL [J].
Chen, Ching-Lung ;
Wu, Chei-Wei .
INTERNATIONAL JOURNAL OF INFORMATION TECHNOLOGY & DECISION MAKING, 2012, 11 (01) :77-102
[46]   Web Document Classification by Keywords Using Random Forests [J].
Klassen, Myungsook ;
Paturi, Nikhila .
NETWORKED DIGITAL TECHNOLOGIES, PT 2, 2010, 88 :256-261
[47]   Gradient modeling of conifer species using random forests [J].
Evans, Jeffrey S. ;
Cushman, Samuel A. .
LANDSCAPE ECOLOGY, 2009, 24 (05) :673-683
[48]   Predicting Cancer Patients' Survival Using Random Forests [J].
Bertolini, Camila Takemoto ;
Leite, Saul de Castro ;
Almeida, Fernanda Nascimento .
ADVANCES IN BIOINFORMATICS AND COMPUTATIONAL BIOLOGY, BSB 2019, 2020, 11347 :96-106
[49]   Modeling of photovoltaic array using random forests technique [J].
Ibrahim, Ibrahim A. ;
Mohamed, Azah ;
Khatib, Tamer .
2015 IEEE CONFERENCE ON ENERGY CONVERSION (CENCON), 2015, :390-393
[50]   Spatial downscaling of precipitation using adaptable random forests [J].
He, Xiaogang ;
Chaney, Nathaniel W. ;
Schleiss, Marc ;
Sheffield, Justin .
WATER RESOURCES RESEARCH, 2016, 52 (10) :8217-8237