A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

被引:4
作者
Jentzen, Arnulf [1 ,2 ,3 ]
Riekert, Adrian [3 ]
机构
[1] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Peoples R China
[2] Chinese Univ Hong Kong, Shenzhen Res Inst Big Data, Shenzhen, Peoples R China
[3] Univ Munster, Appl Math, Inst Anal & Numer, Munster, Germany
来源
ZEITSCHRIFT FUR ANGEWANDTE MATHEMATIK UND PHYSIK | 2022年 / 73卷 / 05期
关键词
Artificial intelligence; Neural networks; Stochastic gradient descent; Non-convex optimization;
D O I
10.1007/s00033-022-01716-w
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully connected feedforward artificial neural networks with ReLU activation. The main result of this work proves that the risk of the SGD process converges to zero if the target function under consideration is constant. In the established convergence result the considered artificial neural networks consist of one input layer, one hidden layer, and one output layer (with d is an element of N neurons on the input layer, H is an element of N neurons on the hidden layer, and one neuron on the output layer). The learning rates of the SGD process are assumed to be sufficiently small, and the input data used in the SGD process to train the artificial neural networks is assumed to be independent and identically distributed.
引用
收藏
页数:30
相关论文
共 47 条
[1]  
Sankararaman KA, 2020, Arxiv, DOI arXiv:1904.06963
[2]  
Akyildiz ÖD, 2024, Arxiv, DOI arXiv:2002.05465
[3]  
Allen-Zhu Z, 2019, PR MACH LEARN RES, V97
[4]  
Allen-Zhu Z, 2019, ADV NEUR IN, V32
[5]  
Bach F., 2013, Advances in neural information processing systems, P773
[6]  
Bach Francis, 2011, Advances in Neural Information Processing Systems, P451
[7]   Solving the Kolmogorov PDE by Means of Deep Learning [J].
Beck, Christian ;
Becker, Sebastian ;
Grohs, Philipp ;
Jaafari, Nor ;
Jentzen, Arnulf .
JOURNAL OF SCIENTIFIC COMPUTING, 2021, 88 (03)
[8]   Gradient convergence in gradient methods with errors [J].
Bertsekas, DP ;
Tsitsiklis, JN .
SIAM JOURNAL ON OPTIMIZATION, 2000, 10 (03) :627-642
[9]  
Bottou L, 2018, Arxiv, DOI arXiv:1606.04838
[10]   A proof of convergence for gradient descent in the training of artificial neural networks for constant functions [J].
Cheridito, Patrick ;
Jentzen, Arnulf ;
Riekert, Adrian ;
Rossmannek, Florian .
JOURNAL OF COMPLEXITY, 2022, 72