Algorithms for Sparse Support Vector Machines

被引:3
作者
Landeros, Alfonso [1 ]
Lange, Kenneth [1 ,2 ,3 ]
机构
[1] Univ Calif Los Angeles, Dept Computat Med, Los Angeles, CA 90095 USA
[2] Univ Calif Los Angeles, Human Genet, Los Angeles, CA USA
[3] Univ Calif Los Angeles, Stat, Los Angeles, CA USA
关键词
Discriminant analysis; Julia; Sparsity; Unsupervised learning; MAJORIZATION; RULES;
D O I
10.1080/10618600.2022.2146697
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Many problems in classification involve huge numbers of irrelevant features. Variable selection reveals the crucial features, reduces the dimensionality of feature space, and improves model interpretation. In the support vector machine literature, variable selection is achieved by l1 penalties. These convex relaxations seriously bias parameter estimates toward 0 and tend to admit too many irrelevant features. The current article presents an alternative that replaces penalties by sparse-set constraints. Penalties still appear, but serve a different purpose. The proximal distance principle takes a loss function L(beta) and adds the penalty rho/2dist(beta,Sk)(2) capturing the squared Euclidean distance of the parameter vector beta to the sparsity set S-k where at most k components of beta are nonzero. If beta(rho) represents the minimum of the objective f rho(beta)=L(beta)+rho/2dist(beta,Sk)(2), then beta(rho )tends to the constrained minimum of L(beta) over S-k as rho tends to infinity. We derive two closely related algorithms to carry out this strategy. Our simulated and real examples vividly demonstrate how the algorithms achieve better sparsity without loss of classification power. for this article are available online.
引用
收藏
页码:1097 / 1108
页数:12
相关论文
共 42 条
[1]  
Barghout L, 2015, STUD BIG DATA, V10, P285, DOI 10.1007/978-3-319-16829-6_12
[2]  
Beltrami E.J., 1970, An Algorithmic Approach to Nonlinear Analysis and Optimization
[3]   Support vector clustering [J].
Ben-Hur, A ;
Horn, D ;
Siegelmann, HT ;
Vapnik, V .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (02) :125-137
[4]  
Cauwenberghs G, 2001, ADV NEUR IN, V13, P409
[5]   Distance majorization and its applications [J].
Chi, Eric C. ;
Zhou, Hua ;
Lange, Kenneth .
MATHEMATICAL PROGRAMMING, 2014, 146 (1-2) :409-436
[6]   SUPPORT-VECTOR NETWORKS [J].
CORTES, C ;
VAPNIK, V .
MACHINE LEARNING, 1995, 20 (03) :273-297
[7]  
Courant R., 1994, Bull. Amer. Math. Soc, V1, DOI [10.1201/b16924-5, DOI 10.1201/B16924-5]
[8]   Training invariant support vector machines [J].
Decoste, D ;
Schölkopf, B .
MACHINE LEARNING, 2002, 46 (1-3) :161-190
[9]  
Dua D, 2019, UCI MACHINE LEARNING
[10]   Sequence comparison and protein structure prediction [J].
Dunbrack, Roland L., Jr. .
CURRENT OPINION IN STRUCTURAL BIOLOGY, 2006, 16 (03) :374-384