Linear Cost-sensitive Max-margin Embedded Feature Selection for SVM

被引:20
作者
Aram, Khalid Y. [1 ]
Lam, Sarah S. [2 ]
Khasawneh, Mohammad T. [2 ]
机构
[1] Emporia State Univ, Dept Business Adm, Emporia, KS 66801 USA
[2] SUNY Binghamton, Dept Syst Sci & Ind Engn, Binghamton, NY 13902 USA
关键词
Classification; Cost-sensitive learning; Feature selection; Mathematical programming; Support vector machines; VECTOR; CLASSIFICATION; MACHINE; CANCER;
D O I
10.1016/j.eswa.2022.116683
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The information needed for a certain machine application can be often obtained from a subset of the available features. Strongly relevant features should be retained to achieve desirable model performance. This research focuses on selecting relevant independent features for Support Vector Machine (SVM) classifiers in a cost-sensitive manner. A review of recent literature about feature selection for SVM revealed a lack of linear pro-gramming embedded SVM feature selection models. Most reviewed models were mixed-integer linear or nonlinear. Further, the review highlighted a lack of cost-sensitive SVM feature selection models. Cost sensitivity improves the generalization of SVM feature selection models, making them applicable to various cost-of-error situations. It also helps with handling imbalanced data. This research introduces an SVM-based filter method named Knapsack Max-Margin Feature Selection (KS-MMFS), which is a proposed linearization of the quadratic Max-Margin Feature Selection (MMFS) model. MMFS provides explicit estimates of feature importance in terms of relevance and redundancy. KS-MMFS was then used to develop a linear cost-sensitive SVM embedded feature selection model. The proposed model was tested on a group of 11 benchmark datasets and compared to relevant models from the literature. The results and analysis showed that different cost sensitivity (i.e., sensitivity-spe-cificity tradeoff) requirements influence the features selected. The analysis demonstrated the competitive per-formance of the proposed model compared with relevant models. The model achieved an average improvement of 31.8% on classification performance with a 22.4% average reduction in solution time. The results and analysis in this research demonstrated the competitive performance of the proposed model as an efficient cost-sensitive embedded feature selection method.
引用
收藏
页数:11
相关论文
共 57 条
[1]  
Alelyani S., 2011, IEEE INT C HIGH PERF, P701
[2]  
[Anonymous], 1999, Proceedings of the international joint conference on AI
[3]  
[Anonymous], 2022, Gurobi Optimizer
[4]  
[Anonymous], 1994, ADV NEURAL INFORM PR
[5]   Cost-sensitive Feature Selection for Support Vector Machines [J].
Benitez-Pena, S. ;
Blanquero, R. ;
Carrizosa, E. ;
Ramirez-Cobo, P. .
COMPUTERS & OPERATIONS RESEARCH, 2019, 106 :169-178
[6]   On support vector machines under a multiple-cost scenario [J].
Benitez-Pena, Sandra ;
Blanquero, Rafael ;
Carrizosa, Emilio ;
Ramirez-Cobo, Pepa .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2019, 13 (03) :663-682
[7]  
Bishop C.M., 2006, PATTERN RECOGNITION, DOI [DOI 10.18637/JSS.V017.B05, 10.1117/1.2819119]
[8]   A framework for cost-based feature selection [J].
Bolon-Canedo, V. ;
Porto-Diaz, I. ;
Sanchez-Marono, N. ;
Alonso-Betanzos, A. .
PATTERN RECOGNITION, 2014, 47 (07) :2481-2489
[9]  
Bradley P. S., 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P82
[10]   Feature Selection Based on the SVM Weight Vector for Classification of Dementia [J].
Bron, Esther E. ;
Smits, Marion ;
Niessen, Wiro J. ;
Klein, Stefan .
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2015, 19 (05) :1617-1626