Bilevel hyperparameter optimization for support vector classification: theoretical analysis and a solution method

被引：2

作者：

Li, Qingna ^{[1
]}

Li, Zhen ^{[2
]}

Zemkoho, Alain ^{[3
]}

机构：

[1] Beijing Inst Technol, Sch Math & Stat, Beijing Key Lab MCAACl, Key Lab Math Theory & Computat Informat Secur, Beijing 100081, Peoples R China

[2] Beijing Inst Technol, Sch Math & Stat, Beijing 100081, Peoples R China

[3] Univ Southampton, Sch Math Sci, Southampton SO17 1BJ, Hants, England

来源：

MATHEMATICAL METHODS OF OPERATIONS RESEARCH | 2022年 / 96卷 / 03期

基金：

英国工程与自然科学研究理事会; 美国国家科学基金会;

关键词：

Support vector classification; Hyperparameter selection; Bilevel optimization; Mathematical program with equilibrium constraints; C-stationarity; MATHEMATICAL PROGRAMS; MODEL SELECTION; OPTIMALITY CONDITIONS; REFORMULATION; CONVERGENCE; ALGORITHM;

D O I：

10.1007/s00186-022-00798-6

中图分类号：

C93 [管理学]; O22 [运筹学];

学科分类号：

070105 ; 12 ; 1201 ; 1202 ; 120202 ;

摘要：

Support vector classification (SVC) is a classical and well-performed learning method for classification problems. A regularization parameter, which significantly affects the classification performance, has to be chosen and this is usually done by the cross-validation procedure. In this paper, we reformulate the hyperparameter selection problem for support vector classification as a bilevel optimization problem in which the upper-level problem minimizes the average number of misclassified data points over all the cross-validation folds, and the lower-level problems are the l(1)-loss SVC problems, with each one for each fold in T-fold cross-validation. The resulting bilevel optimization model is then converted to a mathematical program with equilibrium constraints (MPEC). To solve this MPEC, we propose a global relaxation cross-validation algorithm (GR-CV) based on the well-know Sholtes-type global relaxation method (GRM). It is proven to converge to a C-stationary point. Moreover, we prove that the MPEC-tailored version of the Mangasarian-Fromovitz constraint qualification (MFCQ), which is a key property to guarantee the convergence of the GRM, automatically holds at each feasible point of this MPEC. Extensive numerical results verify the efficiency of the proposed approach. In particular, compared with other methods, our algorithm enjoys superior generalization performance over almost all the data sets used in this paper.

引用

页码：315 / 350

页数：36

共 60 条

[1] Anitescu M., 2000, PREPRINT
[2] [Anonymous], 2004, P 21 INT C MACH LEAR
[3] Bennett KP, 2006, IEEE IJCNN, P1922
[4] Bennett KP, 2008, LECT NOTES COMPUT SC, V5050, P25
[5] Choosing multiple parameters for support vector machines
Chapelle, O
Vapnik, V
Bousquet, O
Mukherjee, S
[J]. MACHINE LEARNING, 2002, 46 (1-3) : 131 - 159
[6] Problem formulations and solvers in linear SVM: a review
Chauhan, Vinod Kumar
Dahiya, Kalpana
Sharma, Anuj
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2019, 52 (02) : 803 - 855
[7] An overview of bilevel optimization
Colson, Benoit
Marcotte, Patrice
Savard, Gilles
[J]. ANNALS OF OPERATIONS RESEARCH, 2007, 153 (01) : 235 - 256
[8] SUPPORT-VECTOR NETWORKS
CORTES, C
VAPNIK, V
[J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297
[9] Si-level stochastic gradient for large scale support vector machine
Couellan, Nicolas
Wang, Wenjuan
[J]. NEUROCOMPUTING, 2015, 153 : 300 - 308
[10] Cristianini N., 2000, INTRO SUPPORT VECTOR, DOI [10.1017/CBO9780511801389, DOI 10.1017/CBO9780511801389]

← 1 2 3 4 5 6 →