Characterizing the implicit bias via a primal-dual analysis

被引：0

作者：

Ji, Ziwei ^{[1
]}

Telgarsky, Matus ^{[1
]}

机构：

[1] Univ Illinois, Urbana, IL 61801 USA

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

ADABOOST;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper shows that the implicit bias of gradient descent on linearly separable data is exactly characterized by the optimal solution of a dual optimization problem given by a smoothed margin, even for general losses. This is in contrast to prior results, which are often tailored to exponentially-tailed losses. For the exponential loss specifically, with n training examples and t gradient descent steps, our dual analysis further allows us to prove an O (ln(n)= ln(t)) convergence rate to the l(2) maximum margin direction, when a constant step size is used. This rate is tight in both n and t, which has not been presented by prior work. On the other hand, with a properly chosen but aggressive step size schedule, we prove O(1=t) rates for both l(2) margin maximization and implicit bias, whereas prior work (including all first-order methods for the general hard-margin linear SVM problem) proved (O) over tilde (1/root t) margin rates, or O(1/t) margin rates to a suboptimal margin, with an implied (slower) bias rate. Our key observations include that gradient descent on the primal variable naturally induces a mirror descent update on the dual variable, and that the dual objective in this setting is smooth enough to give a faster rate.

引用

页数：33

共 32 条

[1] Borwein JM, 2000, Convex Analysis and Nonlinear Optimization. Theory and Examples, Canadian Mathematical Society
[2] Chizat L, 2020, Arxiv, DOI arXiv:2002.04486
[3] Sublinear Optimization for Machine Learning
Clarkson, Kenneth L.
Hazan, Elad
Woodruff, David P.
[J]. JOURNAL OF THE ACM, 2012, 59 (05)
[4] Logistic regression, AdaBoost and Bregman distances
Collins, M
Schapire, RE
Singer, Y
[J]. MACHINE LEARNING, 2002, 48 (1-3) : 253 - 285
[5] Foster Dylan J, Advances in Neural Information Processing Systems, P6240
[6] Freund Robert M, 2013, arXiv
[7] A decision-theoretic generalization of on-line learning and an application to boosting
Freund, Y
Schapire, RE
[J]. JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) : 119 - 139
[8] Gordon, 1999, APPROXIMATE SOLUTION
[9] Gunasekar S, 2020, Arxiv, DOI arXiv:1802.08246
[10] Gunasekar Suriya, Advances in Neural Information Processing Systems, P9461

← 1 2 3 4 →