Fast Margin Maximization via Dual Acceleration

被引:0
作者
Ji, Ziwei [1 ]
Srebro, Nathan [2 ]
Telgarsky, Matus [1 ]
机构
[1] Univ Illinois, Dept Comp Sci, Urbana, IL 61801 USA
[2] Toyota Tech Inst Chicago, Chicago, IL USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present and analyze a momentum-based gradient method for training linear classifiers with an exponentially-tailed loss (e.g., the exponential or logistic loss), which maximizes the classification margin on separable data at a rate of (O) over tilde (1/t(2)). This contrasts with a rate of O(1/log(t)) for standard gradient descent, and O(1/t) for normalized gradient descent. This momentum-based method is derived via the convex dual of the maximum-margin problem, and specifically by applying Nesterov acceleration to this dual, which manages to result in a simple and intuitive method in the primal. This dual view can also be used to derive a stochastic variant, which performs adaptive nonuniform sampling via the dual variables.
引用
收藏
页数:10
相关论文
共 31 条
  • [1] Allen -Zhu Z., 2014, ARXIV14071537
  • [2] [Anonymous], 2018, ADV NEURAL INFORM PR
  • [3] Bartlett PL, 2017, 31 ANN C NEURAL INFO, V30
  • [4] Borwein J., 2000, CMS BOOKS MATH
  • [5] Chizat L, 2020, PR MACH LEARN RES, V125
  • [6] Sublinear Optimization for Machine Learning
    Clarkson, Kenneth L.
    Hazan, Elad
    Woodruff, David P.
    [J]. JOURNAL OF THE ACM, 2012, 59 (05)
  • [7] Cotter A., 2012, ICML
  • [8] Du S. S., 2018, INT C LEARNING REPRE
  • [9] A decision-theoretic generalization of on-line learning and an application to boosting
    Freund, Y
    Schapire, RE
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
  • [10] Ghadimi E, 2015, 2015 EUROPEAN CONTROL CONFERENCE (ECC), P310, DOI 10.1109/ECC.2015.7330562