Natural conjugate gradient training of multilayer perceptrons

被引:15
作者
Gonzalez, Ana
Dorronsoro, Jose R. [1 ]
机构
[1] Univ Autonoma Madrid, Dpto Ingn Informat, E-28049 Madrid, Spain
关键词
multilayer perceptrons; natural gradient; conjugate gradient;
D O I
10.1016/j.neucom.2007.11.035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Natural gradient (NG) descent, arguably the fastest on-line method for multilayer perceptron (MLP) training, exploits the "natural" Riemannian metric that the Fisher information matrix defines in the MLP weight space. It also accelerates ordinary gradient descent in a batch setting but then the Fisher matrix essentially coincides with the Gauss-Newton approximation of the Hessian of the MLP square error function and NG is thus related to the Levenberg-Marquardt (LM) method, which may explain its speed-up with respect to standard gradient descent. However, even this comparison is advantageous for NG descent as it should have a linear convergence in a Riemannian weight space compared to the superlinear one of the LM method in the Euclidean weight space. This suggests that it may be interesting to consider superlinear methods for MLP training in a Riemannian setting. In this work we shall discuss how to introduce a natural conjugate gradient (CC) method for MLP training. While a fully Riemannian formulation would result in an extremely costly procedure, we shall make some simplifying assumptions that should give descent directions with properties similar to those of standard CG descent. Moreover, we will also show numerically that natural CG may lead to a faster convergence to better minima, although with a greater cost than that of standard CG that, nevertheless, may be alleviated using a diagonal natural CG variant. (C) 2008 Elsevier B.V. All rights reserved.
引用
收藏
页码:2499 / 2506
页数:8
相关论文
共 16 条
[1]   Adaptive method of realizing natural gradient learning for multilayer perceptrons [J].
Amari, S ;
Park, H ;
Fukumizu, K .
NEURAL COMPUTATION, 2000, 12 (06) :1399-1409
[2]   Natural gradient works efficiently in learning [J].
Amari, S .
NEURAL COMPUTATION, 1998, 10 (02) :251-276
[3]  
Amari SI, 2007, Methods of information geometry, V191
[4]  
[Anonymous], 1998, NEURAL NETWORKS TRIC
[5]   Geometrical methods in neural networks and learning [J].
Fiori, S ;
Amari, S .
NEUROCOMPUTING, 2005, 67 (1-4 SUPPL.) :1-7
[6]  
González A, 2006, LECT NOTES COMPUT SC, V4131, P169
[7]   On "natural" learning and pruning in multilayered perceptrons [J].
Heskes, T .
NEURAL COMPUTATION, 2000, 12 (04) :881-901
[8]  
IGEL C, 2005, INT SERIES NUMERICAL, V151
[9]  
Lee J., 1997, Riemannian Manifolds: An Introduction to Curvature
[10]  
Murphy P.M., 1994, UCI REPOSITORY MACHI