Targeting predictors in random forest regression

被引:54
|
作者
Borup, Daniel [1 ,2 ,3 ]
Christensen, Bent Jesper [1 ,3 ,4 ]
Muhlbach, Nicolaj Sondergaard [1 ,5 ]
Nielsen, Mikkel Slot [1 ,6 ]
机构
[1] CREATES, Aarhus, Denmark
[2] Aarhus Univ, Dept Econ & Business Econ, Fuglesangs Alle 4, DK-8210 Aarhus V, Denmark
[3] Danish Finance Inst DFI, Aarhus, Denmark
[4] Aarhus Univ, Dale T Mortensen Ctr, Dept Econ & Business Econ, Aarhus, Denmark
[5] MIT, Dept Econ, Cambridge, MA 02139 USA
[6] Columbia Univ, Dept Stat, New York, NY 10027 USA
关键词
Random forests; Targeted predictors; High-dimensional forecasting; Weak predictors; Variable selection; VARIABLE SELECTION; CONTENT HORIZONS; LARGE NUMBER; SHRINKAGE;
D O I
10.1016/j.ijforecast.2022.02.010
中图分类号
F [经济];
学科分类号
02 ;
摘要
Random forest (RF) regression is an extremely popular tool for analyzing high -dimen-sional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting) step is required. We show that proper targeting controls the probability of placing splits along strong predictors, thus providing an important complement to RF's feature sampling. This is supported by simulations using finite representative samples. Moreover, we quantify the immediate gain from targeting in terms of the increased strength of individual trees. Macroeconomic and financial applications show that the bias-variance trade-off implied by targeting, due to increased correlation among trees in the forest, is balanced at a medium degree of targeting, selecting the best 5%-30% of commonly applied predictors. Improvements in the predictive accuracy of targeted RF relative to ordinary RF are considerable, up to 21%, occurring both in recessions and expansions, particularly at long horizons.(c) 2022 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:841 / 868
页数:28
相关论文
共 50 条
  • [41] Random Forest Regression in Predicting Students' Achievements and Fuzzy Grades
    Doz, Daniel
    Cotic, Mara
    Felda, Darjo
    MATHEMATICS, 2023, 11 (19)
  • [42] Daily Evapotranspiration Mapping Using Regression Random Forest Models
    Gonzalo-Martin, Consuelo
    Lillo-Saavedra, Mario
    Garcia-Pedrero, Angel
    Lagos, Octavio
    Menasalvas, Ernestina
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2017, 10 (12) : 5359 - 5368
  • [43] Quantile Regression Random Forest Hybrids based Data Imputation
    Yadav, Manish
    Ravi, Vadlamani
    PROCEEDINGS OF 2018 IEEE 17TH INTERNATIONAL CONFERENCE ON COGNITIVE INFORMATICS & COGNITIVE COMPUTING (ICCI*CC 2018), 2018, : 195 - 201
  • [44] Hand Orientation Regression Using Random Forest for Augmented Reality
    Asad, Muhammad
    Slabaugh, Greg
    AUGMENTED AND VIRTUAL REALITY, AVR 2014, 2014, 8853 : 159 - 174
  • [45] Radio Environment Map Construction Based on Random Forest Regression
    Du, Yixiao
    Wang, Hongjun
    Liu, Jinfan
    International Conference on Communication Technology Proceedings, ICCT, 2022, 2022-November-November : 551 - 556
  • [46] Chemometric versus random forest predictors of ionic liquid toxicity
    University of Zagreb, Faculty of Food Technology and Biotechnology, Pierottijeva 6, Zagreb
    10000, Croatia
    Chem Biochem Eng Q, 4 (459-463):
  • [47] Chemometric versus Random Forest Predictors of Ionic Liquid Toxicity
    Kurtanjek, Z.
    CHEMICAL AND BIOCHEMICAL ENGINEERING QUARTERLY, 2014, 28 (04) : 459 - 463
  • [48] Evaluation of Random Subspace and Random Forest Regression Models Based on Genetic Fuzzy Systems
    Lasota, Tadeusz
    Telec, Zbigniew
    Trawinski, Bogdan
    Trawinski, Grzegorz
    ADVANCES IN KNOWLEDGE-BASED AND INTELLIGENT INFORMATION AND ENGINEERING SYSTEMS, 2012, 243 : 88 - 97
  • [49] Some results on random design regression with long memory errors and predictors
    Kulik, Rafal
    Lorek, Pawel
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2011, 141 (01) : 508 - 523
  • [50] Nonparametric regression with predictors missing at random and the scale depending on auxiliary covariates
    Jiang, Tian
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2025, 239