Overtraining, regularization and searching for a minimum, with application to neural networks

被引:99
作者
Sjoberg, J
Ljung, L
机构
[1] Department of Electrical Engineering, Linkoping University, Linkoping
基金
瑞典研究理事会;
关键词
D O I
10.1080/00207179508921605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we discuss the role of criterion minimization as a means for parameter estimation. Most traditional methods, such as maximum likelihood and prediction error identification are based on these principles. However, somewhat surprisingly, it turns out that it is not always 'optimal' to try to find the absolute minimum point of the criterion. The reason is that 'stopped minimization' (where the iterations have been terminated before the absolute minimum has been reached) has more or less identical properties as using regularization (adding a parametric penalty term). Regularization is known to have beneficial effects on the variance of the parameter estimates and it reduces the 'variance contribution' of the misfit. This also explains the concept of 'overtraining' in neural nets. How does one know when to terminate the iterations then? A useful criterion would be to stop iterations when the criterion function applied to a validation data set no longer decreases. However, in this paper, we show that applying this technique extensively may lead to the fact that the resulting estimate is an unregularized estimate for the total data set: estimation + validation data.
引用
收藏
页码:1391 / 1407
页数:17
相关论文
共 13 条
[1]  
DENNIS JE, 1983, NUMERICAL METHODS UN
[2]   RIDGE REGRESSION AND JAMES-STEIN ESTIMATION - REVIEW AND COMMENTS [J].
DRAPER, NR ;
VANNOSTRAND, RC .
TECHNOMETRICS, 1979, 21 (04) :451-466
[3]  
Hecht-Nielsen R., 1990, NEUROCOMPUTING
[4]  
Ljung L., 1999, SYSTEM IDENTIFICATIO
[5]  
LJUNG L, 1992, 1992 IEEE WORKSH NEU
[6]  
MacKay D. J. C., 1991, THESIS CALTECH PASAD
[7]  
MOODY JE, 1992, ADV NEURAL INFORMATI, V4
[8]   REGULARIZATION ALGORITHMS FOR LEARNING THAT ARE EQUIVALENT TO MULTILAYER NETWORKS [J].
POGGIO, T ;
GIROSI, F .
SCIENCE, 1990, 247 (4945) :978-982
[9]   III-CONDITIONING IN NEURAL NETWORK TRAINING PROBLEMS [J].
SAARINEN, S ;
BRAMLEY, R ;
CYBENKO, G .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1993, 14 (03) :693-714
[10]  
Sjoberg J, 1992, IFAC S AD SYST CONTR, P669, DOI DOI 10.1016/s1474-6670(17)50715-6