Distance Correlation-Based Feature Selection in Random Forest

被引:17
作者
Ratnasingam, Suthakaran [1 ]
Munoz-Lopez, Jose [1 ]
机构
[1] Calif State Univ San Bernardino, Dept Math, San Bernardino, CA 92407 USA
关键词
feature selection; random forest; Pearson correlation; distance correlation;
D O I
10.3390/e25091250
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
The Pearson correlation coefficient (& rho;) is a commonly used measure of correlation, but it has limitations as it only measures the linear relationship between two numerical variables. The distance correlation measures all types of dependencies between random vectors X and Y in arbitrary dimensions, not just the linear ones. In this paper, we propose a filter method that utilizes distance correlation as a criterion for feature selection in Random Forest regression. We conduct extensive simulation studies to evaluate its performance compared to existing methods under various data settings, in terms of the prediction mean squared error. The results show that our proposed method is competitive with existing methods and outperforms all other methods in high-dimensional (p & GE;300) nonlinearly related data sets. The applicability of the proposed method is also illustrated by two real data applications.
引用
收藏
页数:15
相关论文
共 25 条
[1]  
[Anonymous], 2001, Handbook of Statistics Department
[2]  
[Anonymous], MACHINE LEARNING P, DOI [10.1016/B978-1-55860-335-6.50012-X, DOI 10.1016/B978-1-55860-335-6.50012-X]
[3]  
[Anonymous], 2000, 17 INT C MACHINE LEA
[4]  
[Anonymous], 2001, Proceedings of the Eighteenth International Con-ference on Machine Learning
[5]  
Biau G, 2008, J MACH LEARN RES, V9, P2015
[6]   High-Dimensional Statistics with a View Toward Applications in Biology [J].
Buehlmann, Peter ;
Kalisch, Markus ;
Meier, Lukas .
ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 1, 2014, 1 :255-U809
[7]   Feature selection in machine learning: A new perspective [J].
Cai, Jie ;
Luo, Jiawei ;
Wang, Shulin ;
Yang, Sheng .
NEUROCOMPUTING, 2018, 300 :70-79
[8]  
Das R, 2022, Arxiv, DOI arXiv:2212.00046
[9]  
Das S., 2001, P 18 INT C MACHINE L, P74
[10]   Feature selection for clustering - A filter solution [J].
Dash, M ;
Choi, K ;
Scheuermann, P ;
Liu, H .
2002 IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2002, :115-122