An Instance- and Label-Based Feature Selection Method in Classification Tasks

被引:2
作者
Fan, Qingcheng [1 ]
Liu, Sicong [1 ]
Zhao, Chunjiang [1 ,2 ]
Li, Shuqin [1 ]
机构
[1] Northwest A&F Univ, Coll Informat Engn, 3 Taicheng Rd, Xianyang 712100, Peoples R China
[2] Beijing Acad Agr & Forestry Sci, Res Ctr Informat Technol, Beijing 100097, Peoples R China
关键词
feature selection; manifold learning; classification;
D O I
10.3390/info14100532
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Feature selection is crucial in classification tasks as it helps to extract relevant information while reducing redundancy. This paper presents a novel method that considers both instance and label correlation. By employing the least squares method, we calculate the linear relationship between each feature and the target variable, resulting in correlation coefficients. Features with high correlation coefficients are selected. Compared to traditional methods, our approach offers two advantages. Firstly, it effectively selects features highly correlated with the target variable from a large feature set, reducing data dimensionality and improving analysis and modeling efficiency. Secondly, our method considers label correlation between features, enhancing the accuracy of selected features and subsequent model performance. Experimental results on three datasets demonstrate the effectiveness of our method in selecting features with high correlation coefficients, leading to superior model performance. Notably, our approach achieves a minimum accuracy improvement of 3.2% for the advanced classifier, lightGBM, surpassing other feature selection methods. In summary, our proposed method, based on instance and label correlation, presents a suitable solution for classification problems.
引用
收藏
页数:14
相关论文
共 24 条
[1]  
[Anonymous], 2013, INT C MACH LEARN
[2]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[3]   Principal component analysis [J].
Bro, Rasmus ;
Smilde, Age K. .
ANALYTICAL METHODS, 2014, 6 (09) :2812-2831
[4]   Machine learning regression and classification methods for fog events prediction [J].
Castillo-Boton, C. ;
Casillas-Perez, D. ;
Casanova-Mateo, C. ;
Ghimire, S. ;
Cerro-Prada, E. ;
Gutierrez, P. A. ;
Deo, R. C. ;
Salcedo-Sanz, S. .
ATMOSPHERIC RESEARCH, 2022, 272
[5]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[6]   Greedy function approximation: A gradient boosting machine [J].
Friedman, JH .
ANNALS OF STATISTICS, 2001, 29 (05) :1189-1232
[7]   Manifold-based constraint Laplacian score for multi-label feature selection [J].
Huang, Rui ;
Jiang, Weidong ;
Sun, Guangling .
PATTERN RECOGNITION LETTERS, 2018, 112 :346-352
[8]  
Ikram ST, 2017, J KING SAUD UNIV-COM, V29, P462, DOI 10.1016/j.jksuci.2015.12.004
[9]   On the least-squares method [J].
Jiang, BN .
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 1998, 152 (1-2) :239-257
[10]  
Ke GL, 2017, ADV NEUR IN, V30