Optimal Feature Set Size in Random Forest Regression

被引:28
|
作者
Han, Sunwoo [1 ]
Kim, Hyunjoong [2 ]
机构
[1] Fred Hutchinson Canc Res Ctr, Vaccine & Infect Dis Div, Seattle, WA 98006 USA
[2] Yonsei Univ, Dept Appl Stat, Seoul 03722, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 08期
基金
新加坡国家研究基金会;
关键词
random forest; feature set size; grid search; regression; PREDICTION;
D O I
10.3390/app11083428
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
One of the most important hyper-parameters in the Random Forest (RF) algorithm is the feature set size used to search for the best partitioning rule at each node of trees. Most existing research on feature set size has been done primarily with a focus on classification problems. We studied the effect of feature set size in the context of regression. Through experimental studies using many datasets, we first investigated whether the RF regression predictions are affected by the feature set size. Then, we found a rule associated with the optimal size based on the characteristics of each data. Lastly, we developed a search algorithm for estimating the best feature set size in RF regression. We showed that the proposed search algorithm can provide improvements over other choices, such as using the default size specified in the randomForest R package and using the common grid search method.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Multimetric feature selection for analyzing multicategory outcomes of colorectal cancer: random forest and multinomial logistic regression models
    Feng, Catherine H.
    Disis, Mary L.
    Cheng, Chao
    Zhang, Lanjing
    LABORATORY INVESTIGATION, 2022, 102 (03) : 236 - 244
  • [42] Optimal transonic buffet aerodynamic noise PSD predictions with Random Forest: Modeling methods and feature selection
    Zhang, Qiao
    Yang, Dangguo
    Zhang, Weiwei
    AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 150
  • [43] Classification of Zambian grasslands using random forest feature importance selection during the optimal phenological period
    Zhao, Yifan
    Zhu, Weiwei
    Wei, Panpan
    Fang, Peng
    Zhang, Xiwang
    Yan, Nana
    Liu, Wenjun
    Zhao, Hao
    Wu, Qirui
    ECOLOGICAL INDICATORS, 2022, 135
  • [44] Coupling Multivariate Adaptive Regression Spline (MARS) and Random Forest (RF): A Hybrid Feature Selection Method in Action
    Nagpal, Arpita
    Singh, Vijendra
    INTERNATIONAL JOURNAL OF HEALTHCARE INFORMATION SYSTEMS AND INFORMATICS, 2019, 14 (01) : 1 - 18
  • [45] Classification of Abdominal ECG Recordings for the Identification of Fetal Risk Using Random Forest and Optimal Feature Selection
    Torres, Fabian
    Escalante-Ramirez, Boris
    Perez-Gonzales, Jorge
    Anselmo Mora-Gutierrez, Roman
    Ponsich, Antonin
    Prieto Rodriguez, Scarlet
    Camargo Marin, Lisbeth
    Guzman Huerta, Mario
    14TH INTERNATIONAL SYMPOSIUM ON MEDICAL INFORMATION PROCESSING AND ANALYSIS, 2018, 10975
  • [46] Random Forest Weighted Local Fréchet Regression with Random Objects
    Qiu, Rui
    Yu, Zhou
    Zhu, Ruoqing
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [47] Evaluation of Random Forest in Crime Prediction: Comparing Three-Layered Random Forest and Logistic Regression
    Oh, Gyeongseok
    Song, Juyoung
    Park, Hyoungah
    Na, Chongmin
    DEVIANT BEHAVIOR, 2022, 43 (09) : 1036 - 1049
  • [48] Random forest regression for magnetic resonance image synthesis
    Jog, Amod
    Carass, Aaron
    Roy, Snehashis
    Pham, Dzung L.
    Prince, Jerry L.
    MEDICAL IMAGE ANALYSIS, 2017, 35 : 475 - 488
  • [49] Estimation of Maize Yield Based on Random Forest Regression
    Wang P.
    Qi X.
    Li L.
    Wang L.
    Xu L.
    Nongye Jixie Xuebao/Transactions of the Chinese Society for Agricultural Machinery, 2019, 50 (07): : 237 - 245
  • [50] Pier scour modelling using random forest regression
    Pal, M. (mpce_pal@yahoo.co.uk), 1600, Taylor and Francis Ltd. (19):