Characterizing the implicit bias via a primal-dual analysis

被引:0
作者
Ji, Ziwei [1 ]
Telgarsky, Matus [1 ]
机构
[1] Univ Illinois, Urbana, IL 61801 USA
来源
ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷
关键词
ADABOOST;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior results, which are often tailored to exponentially-tailed losses. For the exponential loss specifically, with n training examples and t gradient descent steps, our dual analysis further allows us to prove an O (ln(n)= ln(t)) convergence rate to the l(2) maximum margin direction, when a constant step size is used. This rate is tight in both n and t, which has not been presented by prior work. On the other hand, with a properly chosen but aggressive step size schedule, we prove O(1=t) rates for both l(2) margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved (O) over tilde (1/root t) margin rates, or O(1/t) margin rates to a suboptimal margin, with an implied (slower) bias rate. Our key observations include that gradient descent on the primal variable naturally induces a mirror descent update on the dual variable, and that the dual objective in this setting is smooth enough to give a faster rate.
引用
收藏
页数:33
相关论文
共 32 条
  • [1] Borwein JM, 2000, Convex Analysis and Nonlinear Optimization. Theory and Examples, Canadian Mathematical Society
  • [2] Chizat L, 2020, Arxiv, DOI arXiv:2002.04486
  • [3] Sublinear Optimization for Machine Learning
    Clarkson, Kenneth L.
    Hazan, Elad
    Woodruff, David P.
    [J]. JOURNAL OF THE ACM, 2012, 59 (05)
  • [4] Logistic regression, AdaBoost and Bregman distances
    Collins, M
    Schapire, RE
    Singer, Y
    [J]. MACHINE LEARNING, 2002, 48 (1-3) : 253 - 285
  • [5] Foster Dylan J, Advances in Neural Information Processing Systems, P6240
  • [6] Freund Robert M, 2013, arXiv
  • [7] A decision-theoretic generalization of on-line learning and an application to boosting
    Freund, Y
    Schapire, RE
    [J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
  • [8] Gordon, 1999, APPROXIMATE SOLUTION
  • [9] Gunasekar S, 2020, Arxiv, DOI arXiv:1802.08246
  • [10] Gunasekar Suriya, Advances in Neural Information Processing Systems, P9461