Targeting predictors in random forest regression

被引:55
作者
Borup, Daniel [1 ,2 ,3 ]
Christensen, Bent Jesper [1 ,3 ,4 ]
Muhlbach, Nicolaj Sondergaard [1 ,5 ]
Nielsen, Mikkel Slot [1 ,6 ]
机构
[1] CREATES, Aarhus, Denmark
[2] Aarhus Univ, Dept Econ & Business Econ, Fuglesangs Alle 4, DK-8210 Aarhus V, Denmark
[3] Danish Finance Inst DFI, Aarhus, Denmark
[4] Aarhus Univ, Dale T Mortensen Ctr, Dept Econ & Business Econ, Aarhus, Denmark
[5] MIT, Dept Econ, Cambridge, MA 02139 USA
[6] Columbia Univ, Dept Stat, New York, NY 10027 USA
关键词
Random forests; Targeted predictors; High-dimensional forecasting; Weak predictors; Variable selection; VARIABLE SELECTION; CONTENT HORIZONS; LARGE NUMBER; SHRINKAGE;
D O I
10.1016/j.ijforecast.2022.02.010
中图分类号
F [经济];
学科分类号
02 ;
摘要
Random forest (RF) regression is an extremely popular tool for analyzing high -dimen-sional data. Nonetheless, its benefits may be lessened in sparse settings due to weak predictors, and a pre-estimation dimension reduction (targeting) step is required. We show that proper targeting controls the probability of placing splits along strong predictors, thus providing an important complement to RF's feature sampling. This is supported by simulations using finite representative samples. Moreover, we quantify the immediate gain from targeting in terms of the increased strength of individual trees. Macroeconomic and financial applications show that the bias-variance trade-off implied by targeting, due to increased correlation among trees in the forest, is balanced at a medium degree of targeting, selecting the best 5%-30% of commonly applied predictors. Improvements in the predictive accuracy of targeted RF relative to ordinary RF are considerable, up to 21%, occurring both in recessions and expansions, particularly at long horizons.(c) 2022 International Institute of Forecasters. Published by Elsevier B.V. All rights reserved.
引用
收藏
页码:841 / 868
页数:28
相关论文
共 59 条
[21]   Complete subset regressions [J].
Elliott, Graham ;
Gargano, Antonio ;
Timmermann, Allan .
JOURNAL OF ECONOMETRICS, 2013, 177 (02) :357-373
[22]   Sure independence screening for ultrahigh dimensional feature space [J].
Fan, Jianqing ;
Lv, Jinchi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2008, 70 :849-883
[23]   Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models [J].
Fan, Jianqing ;
Feng, Yang ;
Song, Rui .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :544-557
[24]  
Fan JQ, 2010, STAT SINICA, V20, P101
[25]   Forecast content and content horizons for some important macroeconomic time series [J].
Galbraith, John W. ;
Tkacz, Greg .
CANADIAN JOURNAL OF ECONOMICS-REVUE CANADIENNE D ECONOMIQUE, 2007, 40 (03) :935-953
[26]   Content horizons for univariate time-series forecasts [J].
Galbraith, JW .
INTERNATIONAL JOURNAL OF FORECASTING, 2003, 19 (01) :43-55
[27]   Bond Return Predictability: Economic Value and Links to the Macroeconomy [J].
Gargano, Antonio ;
Pettenuzzo, Davide ;
Timmermann, Allan .
MANAGEMENT SCIENCE, 2019, 65 (02) :508-540
[28]   Text as Data [J].
Gentzkow, Matthew ;
Kelly, Bryan ;
Taddy, Matt .
JOURNAL OF ECONOMIC LITERATURE, 2019, 57 (03) :535-574
[29]  
Ghysels E., 2001, The Econometric Analysis of Seasonal Time Series
[30]  
Giannone D., 2021, ECONOMETRICA