Fast Margin Maximization via Dual Acceleration

被引：0

作者：

Ji, Ziwei ^{[1
]}

Srebro, Nathan ^{[2
]}

Telgarsky, Matus ^{[1
]}

机构：

[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA

[2] Toyota Tech Inst Chicago, Chicago, IL USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of (O) over tilde (1/t(2)). This contrasts with a rate of O(1/log(t)) for standard gradient descent, and O(1/t) for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive nonuniform sampling via the dual variables.

引用

页数：10

共 31 条

[1] Allen -Zhu Z., 2014, ARXIV14071537
[2] [Anonymous], 2018, ADV NEURAL INFORM PR
[3] Bartlett PL, 2017, 31 ANN C NEURAL INFO, V30
[4] Borwein J., 2000, CMS BOOKS MATH
[5] Chizat L, 2020, PR MACH LEARN RES, V125
[6] Sublinear Optimization for Machine Learning
Clarkson, Kenneth L.
Hazan, Elad
Woodruff, David P.
[J]. JOURNAL OF THE ACM, 2012, 59 (05)
[7] Cotter A., 2012, ICML
[8] Du S. S., 2018, INT C LEARNING REPRE
[9] A decision-theoretic generalization of on-line learning and an application to boosting
Freund, Y
Schapire, RE
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
[10] Ghadimi E, 2015, 2015 EUROPEAN CONTROL CONFERENCE (ECC), P310, DOI 10.1109/ECC.2015.7330562

← 1 2 3 4 →