Feature selection for classification models via bilevel optimization

被引:16
作者
Agor, Joseph [1 ]
Ozaltin, Osman Y. [2 ]
机构
[1] North Carolina State Univ, Operat Res, Raleigh, NC 27606 USA
[2] North Carolina State Univ, Edward P Fitts Dept Ind & Syst Engn, Raleigh, NC 27695 USA
关键词
Feature selection; Classification; Bilevel programming; Cross validation; LINEAR BILEVEL; GENETIC ALGORITHMS; SEARCH METHOD; INFLUENZA; EVOLUTION; PREDICTION; MUTATIONS; VACCINE; CANCER;
D O I
10.1016/j.cor.2018.05.005
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Selecting model features that would ensure adequate out-of-sample classification is difficult in real life applications of classification often because there is a large number of candidate features. We propose a bilevel programming approach to feature selection problem for classification and develop a novel genetic algorithm as a solution approach. We implement the proposed framework in three different case studies where we classify influenza strains based on antigenic variety, distinguish between good and bad quality colposcopy images, and identify splice junction sites in genetic sequences. As a benchmark for the proposed genetic algorithm, we use a derivative-free optimization method to solve the bilevel feature selection problems in these case studies. The computational experiments show that the proposed bilevel framework improves the overall classification performance while selecting the most important features for the model. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:156 / 168
页数:13
相关论文
共 68 条
[1]   Models for predicting the evolution of influenza to inform vaccine strain selection [J].
Agor, Joseph K. ;
Ozaltin, Osman Y. .
HUMAN VACCINES & IMMUNOTHERAPEUTICS, 2018, 14 (03) :678-683
[2]   Links between linear bilevel and mixed 0-1 programming problems [J].
Audet, C ;
Hansen, P ;
Jaumard, B ;
Savard, G .
JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 1997, 93 (02) :273-300
[3]  
Audet C., 2009, TECHNICAL REPORT, P267
[4]  
Bard Jonathan F, 2013, Practical bilevel optimization: algorithms and applications, V30
[5]  
Bi J., 2003, Journal of Machine Learning Research, V3, P1229, DOI 10.1162/153244303322753643
[6]  
BI Z, 1991, 180O170591 U WAT DEP
[7]   A dynamic programming algorithm for the bilevel knapsack problem [J].
Brotcorne, Luce ;
Hanafi, Said ;
Mansi, Raid .
OPERATIONS RESEARCH LETTERS, 2009, 37 (03) :215-218
[8]   A new approach for solving linear bilevel problems using genetic algorithms [J].
Calvete, Herminia I. ;
Gale, Carmen ;
Mateo, Pedro M. .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2008, 188 (01) :14-28
[9]   Influenza vaccine: The challenge of antigenic drift [J].
Carrat, F. ;
Flahault, A. .
VACCINE, 2007, 25 (39-40) :6852-6862
[10]  
CDC, 2008, INFL DIS