Generalization Bounds for Label Noise Stochastic Gradient Descent

被引:0
作者
Huh, Jung Eun [1 ]
Rebeschini, Patrick [1 ]
机构
[1] Univ Oxford, Dept Stat, Oxford, England
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238 | 2024年 / 238卷
关键词
STABILITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We develop generalization error bounds for stochastic gradient descent (SGD) with label noise in non-convex settings under uniform dissipativity and smoothness conditions. Under a suitable choice of semimetric, we establish a contraction in Wasserstein distance of the label noise stochastic gradient flow that depends polynomially on the parameter dimension d. Using the framework of algorithmic stability, we derive time-independent generalisation error bounds for the discretized algorithm with a constant learning rate. The error bound we achieve scales polynomially with d and with the rate of n(-2/3), where n is the sample size. This rate is better than the best-known rate of n(-1/2) established for stochastic gradient Langevin dynamics (SGLD)-which employs parameter-independent Gaussian noise-under similar conditions. Our analysis offers quantitative insights into the effect of label noise.
引用
收藏
页数:26
相关论文
共 46 条
  • [1] Amir I, 2021, PR MACH LEARN RES, V134, P63
  • [2] Bassily R., 2020, Advances in Neural Information Processing Systems (NeurIPS), V33, P4381
  • [3] Blanc Guy, 2020, C LEARNING THEORY, P483
  • [4] Stability and generalization
    Bousquet, O
    Elisseeff, A
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) : 499 - 526
  • [5] On Stochastic Gradient Langevin Dynamics with Dependent Data Streams: The Fully Nonconvex Case
    Chau, Ngoc Huy
    Moulines, Eric
    Rasonyi, Miklos
    Sabanis, Sotirios
    Zhang, Ying
    [J]. SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2021, 3 (03): : 959 - 986
  • [6] Chaudhari P, 2018, 2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA)
  • [7] GRADIENT ESTIMATES ON MANIFOLDS USING COUPLING
    CRANSTON, M
    [J]. JOURNAL OF FUNCTIONAL ANALYSIS, 1991, 99 (01) : 110 - 124
  • [8] Damian Alex, 2021, Advances in Neural Information Processing Systems, V34, P27449
  • [9] Reflection couplings and contraction rates for diffusions
    Eberle, Andreas
    [J]. PROBABILITY THEORY AND RELATED FIELDS, 2016, 166 (3-4) : 851 - 886
  • [10] Elisseeff A, 2005, J MACH LEARN RES, V6, P55