A robust SVM-based approach with feature selection and outliers detection for classification problems

被引:46
作者
Baldomero-Naranjo, Marta [1 ]
Martinez-Merino, Luisa I. [1 ,2 ]
Rodriguez-Chia, Antonio M. [1 ]
机构
[1] Univ Cadiz, Fac Ciencias, Dept Estadist & Invest Operat, Cadiz, Spain
[2] Univ Sevilla IMUS, Inst Matemat, Seville, Spain
关键词
Data science; Classification; Support vector machine; Outliers detection; Feature selection; Mixed integer programming; SUPPORT VECTOR MACHINES; DETECTION SYSTEM; GENE SELECTION; KERNEL SEARCH; PREDICTION; CANCER;
D O I
10.1016/j.eswa.2021.115017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper proposes a robust classification model, based on support vector machine (SVM), which simultaneously deals with outliers detection and feature selection. The classifier is built considering the ramp loss margin error and it includes a budget constraint to limit the number of selected features. The search of this classifier is modeled using a mixed-integer formulation with big M parameters. Two different approaches (exact and heuristic) are proposed to solve the model. The heuristic approach is validated by comparing the quality of the solutions provided by this approach with the exact approach. In addition, the classifiers obtained with the heuristic method are tested and compared with existing SVM-based models to demonstrate their efficiency.
引用
收藏
页数:16
相关论文
共 58 条
[1]   A feature selection algorithm for intrusion detection system based on Pigeon Inspired Optimizer [J].
Alazzam, Hadeel ;
Sharieh, Ahmad ;
Sabri, Khair Eddin .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 148
[2]   Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays [J].
Alon, U ;
Barkai, N ;
Notterman, DA ;
Gish, K ;
Ybarra, S ;
Mack, D ;
Levine, AJ .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) :6745-6750
[3]   Kernel search: A general heuristic for the multi-dimensional knapsack problem [J].
Angelelli, Enrico ;
Mansini, Renata ;
Speranza, M. Grazia .
COMPUTERS & OPERATIONS RESEARCH, 2010, 37 (11) :2017-2026
[4]   Feature selection for support vector machines using Generalized Benders Decomposition [J].
Aytug, Haldun .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2015, 244 (01) :210-218
[5]   Tightening big Ms in integer programming formulations for support vector machines with ramp loss [J].
Baldomero-Naranjo, Marta ;
Martinez-Merino, Luisa, I ;
Rodriguez-Chia, Antonio M. .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 286 (01) :84-100
[6]  
Bao H, 2016, INT C PAR DISTRIB SY, P948, DOI [10.1109/ICPADS.2016.0127, 10.1109/ICPADS.2016.125]
[7]   On handling indicator constraints in mixed integer programming [J].
Belotti, Pietro ;
Bonami, Pierre ;
Fischetti, Matteo ;
Lodi, Andrea ;
Monaci, Michele ;
Nogales-Gomez, Amaya ;
Salvagnin, Domenico .
COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2016, 65 (03) :545-566
[8]  
Blanco V., 2020, ARXIV200410170VI
[9]  
Blanco V, 2020, J MACH LEARN RES, V21
[10]   Optimal arrangements of hyperplanes for SVM-based multiclass classification [J].
Blanco, Victor ;
Japon, Alberto ;
Puerto, Justo .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2020, 14 (01) :175-199