Temporal Evolution of Generalization during Learning in Linear Networks

被引:24
作者
Baldi, Pierre [1 ,2 ]
Chauvin, Yves [3 ,4 ]
机构
[1] CALTECH, Jet Prop Lab, 4800 Oak Grove Dr, Pasadena, CA 91125 USA
[2] CALTECH, Div Biol, Pasadena, CA 91125 USA
[3] Stanford Univ, Dept Psychol, Stanford, CA 94305 USA
[4] NET ID Inc, Menlo Pk, CA 94025 USA
关键词
D O I
10.1162/neco.1991.3.4.589
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study generalization in a simple framework of feedforward linear networks with n inputs and n outputs, trained from examples by gradient descent on the usual quadratic error function. We derive analytical results on the behavior of the validation function corresponding to the LMS error function calculated on a set of validation patterns. We show that the behavior of the validation function depends critically on the initial conditions and on the characteristics of the noise. Under certain simple assumptions, if the initial weights are sufficiently small, the validation function has a unique minimum corresponding to an optimal stopping time for training for which simple bounds can be calculated. There exists also situations where the validation function can have more complicated and somewhat unexpected behavior such as multiple local minima (at most n) of variable depth and long but finite plateau effects. Additional results and possible extensions are briefly discussed.
引用
收藏
页码:589 / 603
页数:15
相关论文
共 6 条
[1]   NEURAL NETWORKS AND PRINCIPAL COMPONENT ANALYSIS - LEARNING FROM EXAMPLES WITHOUT LOCAL MINIMA [J].
BALDI, P ;
HORNIK, K .
NEURAL NETWORKS, 1989, 2 (01) :53-58
[2]  
Baldi P., 1991, BACK PROPAGATION THE
[3]   What Size Net Gives Valid Generalization? [J].
Baum, Eric B. ;
Haussler, David .
NEURAL COMPUTATION, 1989, 1 (01) :151-160
[4]  
Chauvin Y., 1991, NEURAL INFORM PROCES, V3
[5]   LEARNING FROM EXAMPLES IN LARGE NEURAL NETWORKS [J].
SOMPOLINSKY, H ;
TISHBY, N ;
SEUNG, HS .
PHYSICAL REVIEW LETTERS, 1990, 65 (13) :1683-1686
[6]  
Tishby N., 1989, P INT 1989 JOINT C N, P403