Deterministic convergence of an online gradient method for neural networks

被引:33
作者
Wu, W [1 ]
Xu, YS
机构
[1] Dalian Univ Technol, Dept Math, Dalian 116023, Peoples R China
[2] N Dakota State Univ, Dept Math, Fargo, ND 58105 USA
[3] Acad Sinica, Math Inst, Beijing 100080, Peoples R China
关键词
online stochastic gradient method; nonlinear feedforward neural networks; deterministic convergence; monotonicity; constant learning rate;
D O I
10.1016/S0377-0427(01)00571-4
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
The online gradient method has been widely used as a learning algorithm for neural networks. We establish a deterministic convergence of online gradient methods for the training of a class of nonlinear feedforward neural networks when the training examples are linearly independent. We choose the learning rate eta to be a constant during the training procedure. The monotonicity of the error function in the iteration is proved. A criterion for choosing the learning rate eta is also provided to guarantee the convergence. Under certain conditions similar to those for the classical gradient methods, an optimal convergence rate for our online gradient methods is proved. (C) 2001 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:335 / 347
页数:13
相关论文
共 18 条
[11]  
Kushner HJ., 1997, STOCHASTIC APPROXIMA, DOI [10.1007/978-1-4899-2696-8, DOI 10.1007/978-1-4899-2696-8]
[12]  
Luo Z.-Q., 1994, OPTIMIZATION METHODS, V4, P85
[13]  
Luo Z.-Q., 1991, NEURAL COMPUT, V3, P226
[14]  
Mangasarian O.L., 1994, OPTIMIZATION METHODS, V4, P103, DOI DOI 10.1080/10556789408805581
[15]   Online steepest descent yields weights with nonnormal limiting distribution [J].
Mukherjee, S ;
Fine, TL .
NEURAL COMPUTATION, 1996, 8 (05) :1075-1084
[16]   Improving the error backpropagation algorithm with a modified error function [J].
Oh, SH .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1997, 8 (03) :799-803
[17]  
SHEPHERD AJ, 1996, 2 ORDER METHODS NEUR
[18]   Online learning from finite training sets and robustness to input bias [J].
Sollich, P ;
Barber, D .
NEURAL COMPUTATION, 1998, 10 (08) :2201-2217