Targeting predictors in random forest regression

被引:55
|
作者
Borup, Daniel [1 ,2 ,3 ]
Christensen, Bent Jesper [1 ,3 ,4 ]
Muhlbach, Nicolaj Sondergaard [1 ,5 ]
Nielsen, Mikkel Slot [1 ,6 ]
机构
[1] CREATES, Aarhus, Denmark
[2] Aarhus Univ, Dept Econ & Business Econ, Fuglesangs Alle 4, DK-8210 Aarhus V, Denmark
[3] Danish Finance Inst DFI, Aarhus, Denmark
[4] Aarhus Univ, Dale T Mortensen Ctr, Dept Econ & Business Econ, Aarhus, Denmark
[5] MIT, Dept Econ, Cambridge, MA 02139 USA
[6] Columbia Univ, Dept Stat, New York, NY 10027 USA
关键词
Random forests; Targeted predictors; High-dimensional forecasting; Weak predictors; Variable selection; VARIABLE SELECTION; CONTENT HORIZONS; LARGE NUMBER; SHRINKAGE;
D O I
10.1016/j.ijforecast.2022.02.010
中图分类号
F [经济];
学科分类号
02 ;
摘要
Random forest (RF) regression is an extremely popular tool for analyzing high -dimen-sional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting) step is required. We show that proper targeting controls the probability of placing splits along strong predictors, thus providing an important complement to RF's feature sampling. This is supported by simulations using finite representative samples. Moreover, we quantify the immediate gain from targeting in terms of the increased strength of individual trees. Macroeconomic and financial applications show that the bias-variance trade-off implied by targeting, due to increased correlation among trees in the forest, is balanced at a medium degree of targeting, selecting the best 5%-30% of commonly applied predictors. Improvements in the predictive accuracy of targeted RF relative to ordinary RF are considerable, up to 21%, occurring both in recessions and expansions, particularly at long horizons.(c) 2022 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:841 / 868
页数:28
相关论文
共 50 条
  • [1] Nonparametric Regression With Predictors Missing at Random
    Efromovich, Sam
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (493) : 306 - 319
  • [2] Is interpolation benign for random forest regression?
    Arnould, Ludovic
    Boyer, Claire
    Scornet, Erwan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [3] Unsupervised learning with random forest predictors
    Shi, T
    Horvath, S
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2006, 15 (01) : 118 - 138
  • [4] FRECHET REGRESSION FOR RANDOM OBJECTS WITH EUCLIDEAN PREDICTORS
    Petersen, Alexander
    Mueller, Hans-Georg
    ANNALS OF STATISTICS, 2019, 47 (02): : 691 - 719
  • [5] Multimodal random forest based tensor regression
    Kaymak, Sertan
    Patras, Ioannis
    IET COMPUTER VISION, 2014, 8 (06) : 650 - 657
  • [6] Estimating residual variance in random forest regression
    Mendez, Guillermo
    Lohr, Sharon
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (11) : 2937 - 2950
  • [7] Variable Importance Assessment in Regression: Linear Regression versus Random Forest
    Groemping, Ulrike
    AMERICAN STATISTICIAN, 2009, 63 (04): : 308 - 319
  • [8] A comparison of random forest regression and multiple linear regression for prediction in neuroscience
    Smith, Paul F.
    Ganesh, Siva
    Liu, Ping
    JOURNAL OF NEUROSCIENCE METHODS, 2013, 220 (01) : 85 - 91
  • [9] Random Forest Weighted Local Fréchet Regression with Random Objects
    Qiu, Rui
    Yu, Zhou
    Zhu, Ruoqing
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [10] Variable selection in multivariate linear regression with random predictors
    Mbina, Alban Mbina
    Nkiet, Guy Martial
    N'guessan, Assi
    SOUTH AFRICAN STATISTICAL JOURNAL, 2023, 57 (01) : 27 - 44