Tight Risk Bounds for Gradient Descent on Separable Data

被引:0
作者
Schliserman, Matan [1 ]
Koren, Tomer [1 ,2 ]
机构
[1] Tel Aviv Univ, Blavatnik Sch Comp Sci, Tel Aviv, Israel
[2] Google Res, Tel Aviv, Israel
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年
基金
欧洲研究理事会; 以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the generalization properties of unregularized gradient methods applied to separable linear classification-a setting that has received considerable attention since the pioneering work of Soudry et al. [14]. We establish tight upper and lower (population) risk bounds for gradient descent in this setting, for any smooth loss function, expressed in terms of its tail decay rate. Our bounds take the form Theta(r(l,T)(2)/gamma T-2+r(2)l,T/gamma(2)n), where T is the number of gradient steps,.. is size of the training set, gamma is the data margin, and r(l,t) is a complexity term that depends on the tail decay rate of the loss function (and on..). Our upper bound greatly improves the existing risk bounds due to Shamir [13] and Schliserman and Koren [12], that either applied to specific loss functions or imposed extraneous technical assumptions, and applies to virtually any convex and smooth loss function. Our risk lower bound is the first in this context and establish the tightness of our general upper bound for any given tail decay rate and in all parameter regimes. The proof technique used to show these results is also markedly simpler compared to previous work, and is straightforward to extend to other gradient methods; we illustrate this by providing analogous results for Stochastic Gradient Descent.
引用
收藏
页数:11
相关论文
共 16 条
  • [1] Agarwal A, 2014, PR MACH LEARN RES, V32, P1638
  • [2] Bartlett P. L., 2003, Journal of Machine Learning Research, V3, P463, DOI 10.1162/153244303321897690
  • [3] Ji Ziwei, 2018, ARXIV
  • [4] Ji Ziwei, 2020, P MACHINE LEARNING R, V125, P2109
  • [5] Ji Ziwei, 2019, J ENV SCI CHINA ENGL
  • [6] Kakade Sham M, 2008, ADV NEURAL INFORM PR, V21
  • [7] Lei YW, 2020, PR MACH LEARN RES, V119
  • [8] Nacson Mor Shpigel, 2019, PMLR, P3051
  • [9] Nacson Mor Shpigel, 2019, PMLR, P3420
  • [10] Needell D, 2014, ADV NEUR IN, V27