Optimal Feature Set Size in Random Forest Regression

被引:28
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 08期
基金
新加坡国家研究基金会;
关键词
random forest; feature set size; grid search; regression; PREDICTION;
D O I
10.3390/app11083428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the randomForest R package and using the common grid search method.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Ranked set sampling: Cost and optimal set size
    Nahhas, RW
    Wolfe, DA
    Chen, HY
    BIOMETRICS, 2002, 58 (04) : 964 - 971
  • [32] Feature selection algorithm based on random forest
    Yao, Deng-Ju
    Yang, Jing
    Zhan, Xiao-Juan
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2014, 44 (01): : 137 - 141
  • [33] On the maximal size of tree in a random forest
    Pavlov, Yuriy L.
    DISCRETE MATHEMATICS AND APPLICATIONS, 2024, 34 (04): : 221 - 232
  • [34] Identifying feature relevance using a random forest
    Rogers, Jeremy
    Gunn, Steve
    SUBSPACE, LATENT STRUCTURE AND FEATURE SELECTION, 2006, 3940 : 173 - 184
  • [35] Feature-Weighting and Clustering Random Forest
    Liu, Zhenyu
    Wen, Tao
    Sun, Wei
    Zhang, Qilong
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2021, 14 (01) : 257 - 265
  • [36] Intra-feature Random Forest Clustering
    Cohen, Michael
    MACHINE LEARNING, OPTIMIZATION, AND BIG DATA, MOD 2017, 2018, 10710 : 41 - 49
  • [37] Improving hedonic housing price models by integrating optimal accessibility indices into regression and random forest analyses
    Rey-Blanco, David
    Zofio, Jose L.
    Gonzalez-Arias, Julio
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235
  • [38] Variable Importance Assessment in Regression: Linear Regression versus Random Forest
    Groemping, Ulrike
    AMERICAN STATISTICIAN, 2009, 63 (04): : 308 - 319
  • [39] A comparison of random forest regression and multiple linear regression for prediction in neuroscience
    Smith, Paul F.
    Ganesh, Siva
    Liu, Ping
    JOURNAL OF NEUROSCIENCE METHODS, 2013, 220 (01) : 85 - 91
  • [40] Selecting an Optimal Feature Set for Stance Detection
    Vychegzhanin, Sergey
    Razova, Elena
    Kotelnikov, Evgeny
    Milov, Vladimir
    ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2019, 2019, 11832 : 242 - 253