Targeting predictors in random forest regression

被引:55
|
作者
Borup, Daniel [1 ,2 ,3 ]
Christensen, Bent Jesper [1 ,3 ,4 ]
Muhlbach, Nicolaj Sondergaard [1 ,5 ]
Nielsen, Mikkel Slot [1 ,6 ]
机构
[1] CREATES, Aarhus, Denmark
[2] Aarhus Univ, Dept Econ & Business Econ, Fuglesangs Alle 4, DK-8210 Aarhus V, Denmark
[3] Danish Finance Inst DFI, Aarhus, Denmark
[4] Aarhus Univ, Dale T Mortensen Ctr, Dept Econ & Business Econ, Aarhus, Denmark
[5] MIT, Dept Econ, Cambridge, MA 02139 USA
[6] Columbia Univ, Dept Stat, New York, NY 10027 USA
关键词
Random forests; Targeted predictors; High-dimensional forecasting; Weak predictors; Variable selection; VARIABLE SELECTION; CONTENT HORIZONS; LARGE NUMBER; SHRINKAGE;
D O I
10.1016/j.ijforecast.2022.02.010
中图分类号
F [经济];
学科分类号
02 ;
摘要
Random forest (RF) regression is an extremely popular tool for analyzing high -dimen-sional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting) step is required. We show that proper targeting controls the probability of placing splits along strong predictors, thus providing an important complement to RF's feature sampling. This is supported by simulations using finite representative samples. Moreover, we quantify the immediate gain from targeting in terms of the increased strength of individual trees. Macroeconomic and financial applications show that the bias-variance trade-off implied by targeting, due to increased correlation among trees in the forest, is balanced at a medium degree of targeting, selecting the best 5%-30% of commonly applied predictors. Improvements in the predictive accuracy of targeted RF relative to ordinary RF are considerable, up to 21%, occurring both in recessions and expansions, particularly at long horizons.(c) 2022 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:841 / 868
页数:28
相关论文
共 50 条
  • [11] Evaluation of Random Forest in Crime Prediction: Comparing Three-Layered Random Forest and Logistic Regression
    Oh, Gyeongseok
    Song, Juyoung
    Park, Hyoungah
    Na, Chongmin
    DEVIANT BEHAVIOR, 2022, 43 (09) : 1036 - 1049
  • [12] Estimation of Maize Yield Based on Random Forest Regression
    Wang P.
    Qi X.
    Li L.
    Wang L.
    Xu L.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (07): : 237 - 245
  • [13] Random forest regression for magnetic resonance image synthesis
    Jog, Amod
    Carass, Aaron
    Roy, Snehashis
    Pham, Dzung L.
    Prince, Jerry L.
    MEDICAL IMAGE ANALYSIS, 2017, 35 : 475 - 488
  • [14] Pier scour modelling using random forest regression
    Pal, M. (mpce_pal@yahoo.co.uk), 1600, Taylor and Francis Ltd. (19):
  • [15] A random forest approach for interval selection in functional regression
    Servien, Remi
    Vialaneix, Nathalie
    STATISTICAL ANALYSIS AND DATA MINING, 2024, 17 (04)
  • [16] Optimal Feature Set Size in Random Forest Regression
    Han, Sunwoo
    Kim, Hyunjoong
    APPLIED SCIENCES-BASEL, 2021, 11 (08):
  • [17] Application of random forest regression to spectral multivariate calibration
    Ghasemi, Jahan B.
    Tavakoli, Hossein
    ANALYTICAL METHODS, 2013, 5 (07) : 1863 - 1871
  • [18] Random Forest Regression Based on Partial Least Squares
    Hao, Zhulin
    Du, Jianqiang
    Nie, Bin
    Yu, Fang
    Yu, Riyue
    Xiong, Wangping
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, 2016, 127
  • [19] Logistic Regression and Random Forest for Effective Imbalanced Classification
    Luo, Hanwu
    Pan, Xiubao
    Wang, Qingshun
    Ye, Shasha
    Qian, Ying
    2019 IEEE 43RD ANNUAL COMPUTER SOFTWARE AND APPLICATIONS CONFERENCE (COMPSAC), VOL 1, 2019, : 916 - 917
  • [20] Approximating Prediction Uncertainty for Random Forest Regression Models
    Coulston, John W.
    Blinn, Christine E.
    Thomas, Valerie A.
    Wynne, Randolph H.
    PHOTOGRAMMETRIC ENGINEERING AND REMOTE SENSING, 2016, 82 (03): : 189 - 197