Dimension Reduction Forests: Local Variable Importance Using Structured Random Forests

被引:5
作者
Loyal, Joshua Daniel [1 ]
Zhu, Ruoqing [1 ]
Cui, Yifan [2 ]
Zhang, Xin [3 ]
机构
[1] Univ Illinois, Dept Stat, Champaign, IL 61820 USA
[2] Natl Univ Singapore, Dept Stat & Data Sci, Singapore, Singapore
[3] Florida State Univ, Dept Stat, Tallahassee, FL 32306 USA
关键词
Random forests; Sufficient dimension reduction; Variable importance; SLICED INVERSE REGRESSION;
D O I
10.1080/10618600.2022.2069777
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Random forests are one of the most popular machine learning methods due to their accuracy and variable importance assessment. However, random forests only provide variable importance in a global sense. There is an increasing need for such assessments at a local level, motivated by applications in personalized medicine, policy-making, and bioinformatics. We propose a new nonparametric estimator that pairs the flexible random forest kernel with local sufficient dimension reduction to adapt to a regression function's local structure. This allows us to estimate a meaningful directional local variable importance measure at each prediction point. We develop a computationally efficient fitting procedure and provide sufficient conditions for the recovery of the splitting directions. We demonstrate significant accuracy gains of our proposed estimator over competing methods on simulated and real regression problems. Finally, we apply the proposed method to seasonal particulate matter concentration data collected in Beijing, China, which yields meaningful local importance measures. The methods presented here are available in the drforest Python package. for this article are available online.
引用
收藏
页码:1104 / 1113
页数:10
相关论文
共 50 条
[21]   Conclusive local interpretation rules for random forests [J].
Mollas, Ioannis ;
Bassiliades, Nick ;
Tsoumakas, Grigorios .
DATA MINING AND KNOWLEDGE DISCOVERY, 2022, 36 (04) :1521-1574
[22]   Conclusive local interpretation rules for random forests [J].
Ioannis Mollas ;
Nick Bassiliades ;
Grigorios Tsoumakas .
Data Mining and Knowledge Discovery, 2022, 36 :1521-1574
[23]   Variable selection using support vector regression and random forests: A comparative study [J].
Ben Ishak, Anis .
INTELLIGENT DATA ANALYSIS, 2016, 20 (01) :83-104
[24]   Corner Detection Using Random Forests [J].
Pachori, Shubham ;
Singh, Kshitij ;
Raman, Shanmuganathan .
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON COMPUTER VISION AND IMAGE PROCESSING, CVIP 2016, VOL 1, 2017, 459 :545-553
[25]   Classification Using Streaming Random Forests [J].
Abdulsalam, Hanady ;
Skillicorn, David B. ;
Martin, Patrick .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2011, 23 (01) :22-36
[26]   Structured Labels in Random Forests for Semantic Labelling and Object Detection [J].
Kontschieder, Peter ;
Bulo, Samuel Rota ;
Pelillo, Marcello ;
Bischof, Horst .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (10) :2104-2116
[27]   Efficient Venn predictors using random forests [J].
Ulf Johansson ;
Tuve Löfström ;
Henrik Linusson ;
Henrik Boström .
Machine Learning, 2019, 108 :535-550
[28]   Efficient Venn predictors using random forests [J].
Johansson, Ulf ;
Lofstrom, Tuve ;
Linusson, Henrik ;
Bostrom, Henrik .
MACHINE LEARNING, 2019, 108 (03) :535-550
[29]   Classification of GLM Flashes Using Random Forests [J].
Ringhausen, Jacquelyn ;
Bitzer, Phillip ;
Koshak, William ;
Mecikalski, John .
EARTH AND SPACE SCIENCE, 2021, 8 (11)
[30]   Phishing Detection using RDF and Random Forests [J].
Muppavarapu, Vamsee ;
Rajendran, Archanaa ;
Vasudevan, Shriram .
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2018, 15 (05) :817-824