A contrast based feature selection algorithm for high-dimensional datasets in machine learning

被引:0
作者
Cao, Chunxu [1 ]
Zhang, Qiang [2 ,3 ]
Deng, Yuhui [3 ]
机构
[1] Beijing Normal Univ, Sch Math Sci, Beijing 100875, Peoples R China
[2] Beijing Normal Univ, Res Ctr Math, Zhuhai 519087, Peoples R China
[3] Beijing Normal Hong Kong Baptist Univ, Guangdong Prov Key Lab Interdisciplinary Res & App, Zhuhai 519087, Peoples R China
基金
中国国家自然科学基金;
关键词
Feature selection; Filter feature selection method; Model-free feature selection method; High-dimensional data; Discriminative features; Extreme gradient boosting; INFORMATION;
D O I
10.1016/j.ins.2025.122308
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection plays a pivotal role in enhancing machine learning models by identifying relevant features and eliminating redundancies. However, existing methods often face challenges with high computational costs and inefficiencies, particularly when applied to large-scale, high-dimensional datasets. To address these issues, we propose ContrastFS, a novel contrast-based feature selection method that evaluates feature importance by analyzing discrepancies in feature distributions across different classes. By leveraging dimensionless surrogate of class-wise feature statistics, ContrastFS enables efficient assessment of both feature relevance and redundancy. Comprehensive experiments on diverse benchmark datasets demonstrate that ContrastFS achieves computational efficiency that is several orders of magnitude higher than state-of-the-art methods while maintaining competitive accuracy. Furthermore, it effectively reduces feature redundancy, enhancing both model interpretability and performance. With its efficiency, scalability, and robustness, ContrastFS offers a powerful solution for feature selection in high-dimensional datasets, making it particularly suited for large-scale artificial intelligence applications where speed and accuracy are critical.
引用
收藏
页数:22
相关论文
共 44 条
[1]  
Yang Y., Pedersen J.O., A comparative study on feature selection in text categorization, Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412-420, (1997)
[2]  
Bolon-Canedo V., Remeseiro B., Feature selection in image analysis: a survey, Artif. Intell. Rev., 53, 4, pp. 2905-2931, (2020)
[3]  
Saeys Y., Inza I., Larranaga P., A review of feature selection techniques in bioinformatics, Bioinformatics, 23, 19, pp. 2507-2517, (2007)
[4]  
Bellman R., Dynamic Programming, (1957)
[5]  
Li J., Liu H., Challenges of feature selection for big data analytics, IEEE Intell. Syst. Appl., 32, 2, pp. 9-15, (2017)
[6]  
Glasmachers T., Limits of end-to-end learning, Proceedings of the Ninth Asian Conference on Machine Learning, PMLR, pp. 17-32, (2017)
[7]  
Pearson K., LIII. On lines and planes of closest fit to systems of points in space, Lond. Edinb. Dublin Philos. Mag. J. Sci., 2, 11, pp. 559-572, (1901)
[8]  
Li J., Cheng K., Wang S., Morstatter F., Trevino R.P., Tang J., Liu H., Feature selection: a data perspective, ACM Comput. Surv., 50, 6, pp. 1-45, (2018)
[9]  
Song Z., Li J., Variable selection with false discovery rate control in deep neural networks, Nat. Mach. Intell., 3, 5, pp. 426-433, (2021)
[10]  
Ding A.A., Dy J.G., Li Y., Chang Y., A robust-equitable measure for feature ranking and selection, J. Mach. Learn. Res., 18, 71, pp. 1-46, (2017)