The normalized risk-averting error criterion for avoiding nonglobal local minima in training neural networks

被引：4

作者：

Lo, James Ting-Ho ^{[1
]}

Gui, Yichuan ^{[2
]}

Peng, Yun ^{[2
]}

机构：

[1] Univ Maryland, Dept Math & Stat, Baltimore, MD 21250 USA

[2] Univ Maryland, Dept Comp Sci & Elect Engn, Baltimore, MD 21250 USA

来源：

NEUROCOMPUTING | 2015年 / 149卷

基金：

美国国家科学基金会;

关键词：

Neural network; Training; Convexification; Risk-averting error; Global optimization; Local minimum;

D O I：

10.1016/j.neucom.2013.11.056

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The convexification method for data fitting is capable of avoiding nonglobal local minima, but suffers from two shortcomings: the risk-averting error (RAE) criterion grows exponentially as its risk-sensitivity index A increases, and the existing method of determining A is often not effective. To eliminate these shortcomings, the normalized RAE (NRAE) is herein proposed. As NRAE is a monotone increasing function of RAE, the region without a nonglobal local minimum of NRAE expands as does that of RAE. However, NRAE does not grow unboundedly as does RAE. The performances of training with NRAE at a fixed A are reported. Over a large range of the risk-sensitivity index, such training has a high rate of achieving a global or near global minimum starting with different initial weight vectors of the neural network under training. It is observed that at a large A, the landscape of the NRAE is rather flat, which slows down the training to a halt. This observation motivates the development of the NRAE-MSE method that exploits the large region of an NRAE without a nonglobal local minimum and takes excursions from time to time for training with the standard mean squared error (MSE) to zero into a global or near global minimum. A number of examples of approximating functions that involve fine features or under-sampled segments are used to test the method. Numerical experiments show that the NRAE-MSE training method has a success rate of 100% in all the testing trials for each example, all starting with randomly selected initial weights. The method is also applied to classifying numerals in the well-known MNIST dataset. The new training method outperforms other methods reported in the literature under the same operating conditions. (C) 2014 Elsevier BV. All rights reserved.

引用

页码：3 / 12

页数：10

共 22 条

[1]

Aarts E., 1989, THE NEURON

[2]

[Anonymous], 1999, Neural and adaptive systems: fundamentals through simulations with CD-ROM

[3]

[Anonymous], 1999, Genetic Algorithms + Data Structures = Evolution Programs

[4]

[Anonymous], 2006, Neural Networks in a Softcomputing Framework

[5]

[Anonymous], 2006, Pattern recognition and machine learning

[6]

Bengio Y., 2006, Advances in Neural Information Processing Systems, V19, DOI DOI 10.7551/MITPRESS/7503.003.0024

[7]

Broyden C. G., 1970, Journal of the Institute of Mathematics and Its Applications, V6, P222

[8]

Erhan D, 2010, J MACH LEARN RES, V11, P625

[9] A NEW APPROACH TO VARIABLE METRIC ALGORITHMS [J].

FLETCHER, R .

COMPUTER JOURNAL, 1970, 13 (03) :317-&

[10] A FAMILY OF VARIABLE-METRIC METHODS DERIVED BY VARIATIONAL MEANS [J].

GOLDFARB, D .

MATHEMATICS OF COMPUTATION, 1970, 24 (109) :23-&

← 1 2 3 →