Stability analysis of stochastic gradient descent for homogeneous neural networks and linear classifiers

被引:6
|
作者
Paquin, Alexandre Lemire [1 ]
Chaib-draa, Brahim [1 ]
Giguere, Philippe [1 ]
机构
[1] Laval Univ, Dept Comp Sci & Software Engn, Pavillon Adrien Pouliot 1065,Ave Med, Quebec City, PQ G1V 0A6, Canada
关键词
Generalization; Deep learning; Stochastic gradient descent; Stability;
D O I
10.1016/j.neunet.2023.04.028
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We prove new generalization bounds for stochastic gradient descent when training classifiers with invariances. Our analysis is based on the stability framework and covers both the convex case of linear classifiers and the non-convex case of homogeneous neural networks. We analyze stability with respect to the normalized version of the loss function used for training. This leads to investigating a form of angle-wise stability instead of euclidean stability in weights. For neural networks, the measure of distance we consider is invariant to rescaling the weights of each layer. Furthermore, we exploit the notion of on-average stability in order to obtain a data-dependent quantity in the bound. This data-dependent quantity is seen to be more favorable when training with larger learning rates in our numerical experiments. This might help to shed some light on why larger learning rates can lead to better generalization in some practical scenarios.(c) 2023 Elsevier Ltd. All rights reserved.
引用
收藏
页码:382 / 394
页数:13
相关论文
共 50 条
  • [21] Overparametrized Multi-layer Neural Networks: Uniform Concentration of Neural Tangent Kernel and Convergence of Stochastic Gradient Descent
    Xu, Jiaming
    Zhu, Hanjing
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25 : 1 - 83
  • [22] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
    Vasudevan, Shrihari
    ENTROPY, 2020, 22 (05)
  • [23] Stochastic gradient descent analysis for the evaluation of a speaker recognition
    Nasef, Ashrf
    Marjanovic-Jakovljevic, Marina
    Njegus, Angelina
    ANALOG INTEGRATED CIRCUITS AND SIGNAL PROCESSING, 2017, 90 (02) : 389 - 397
  • [24] Adjusted stochastic gradient descent for latent factor analysis
    Li, Qing
    Xiong, Diwen
    Shang, Mingsheng
    INFORMATION SCIENCES, 2022, 588 : 196 - 213
  • [25] Stochastic gradient descent analysis for the evaluation of a speaker recognition
    Ashrf Nasef
    Marina Marjanović-Jakovljević
    Angelina Njeguš
    Analog Integrated Circuits and Signal Processing, 2017, 90 : 389 - 397
  • [26] Stochastic Gradient Descent for Large-scale Linear Nonparallel SVM
    Tang, Jingjing
    Tian, Yingjie
    Wu, Guoqiang
    Li, Dewei
    2017 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2017), 2017, : 980 - 983
  • [27] On the Convergence of Stochastic Gradient Descent for Linear Inverse Problems in Banach Spaces
    Jin, Bangti
    Kereta, Zeljko
    SIAM JOURNAL ON IMAGING SCIENCES, 2023, 16 (02): : 671 - 705
  • [28] Accelerating deep neural network training with inconsistent stochastic gradient descent
    Wang, Linnan
    Yang, Yi
    Min, Renqiang
    Chakradhar, Srimat
    NEURAL NETWORKS, 2017, 93 : 219 - 229
  • [29] Convergence rates for shallow neural networks learned by gradient descent
    Braun, Alina
    Kohler, Michael
    Langer, Sophie
    Walk, Harro
    BERNOULLI, 2024, 30 (01) : 475 - 502
  • [30] A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions
    Arnulf Jentzen
    Adrian Riekert
    Zeitschrift für angewandte Mathematik und Physik, 2022, 73