A theoretical comparison of batch-mode, on-line, cyclic, and almost-cyclic learning

被引:44
作者
Heskes, T
Wiegerinck, W
机构
[1] Department of Medical Physics and Biophysics, RWCP (Real World Computing Program), University of Nijmegen, NL 6525 EZ Nijmegen
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1996年 / 7卷 / 04期
关键词
D O I
10.1109/72.508935
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study and compare different neural network learning strategies: batch-mode learning, on-line learning, cyclic learning, and almost-cyclic learning, Incremental learning strategies require less storage capacity than batch-mode learning, However, due to the arbitrariness in the presentation order of the training patterns, incremental learning is a stochastic process; whereas batch-mode learning is deterministic, In zeroth order, i.e., as the learning parameter eta tends to zero, all learning strategies approximate the same ordinary differential equation for convenience referred to as the ''ideal behavior.'' Using stochastic methods valid for small learning parameters eta, we derive differential equations describing the evolution of the lowest-order deviations from this ideal behavior. We compute how the asymptotic misadjustment, measuring the average asymptotic distance from a stable fixed point of the ideal behavior, scales as a function of the learning parameter and the number of training patterns. Knowing the asymptotic misadjustment, we calculate the typical number of learning steps necessary to generate a weight within order epsilon of this fixed point, both with fixed and time-dependent learning parameters. We conclude that almost-cyclic learning (learning with random cycles) Is a better alternative for batch-mode learning than cyclic learning (learning with a fixed cycle).
引用
收藏
页码:919 / 925
页数:7
相关论文
共 20 条
[1]   A THEORY OF ADAPTIVE PATTERN CLASSIFIERS [J].
AMARI, S .
IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1967, EC16 (03) :299-+
[2]   OPTIMIZATION FOR TRAINING NEURAL NETS [J].
BARNARD, E .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (02) :232-240
[3]   COMPUTING 2ND DERIVATIVES IN FEEDFORWARD NETWORKS - A REVIEW [J].
BUNTINE, WL ;
WEIGEND, AS .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (03) :480-488
[4]  
Gardiner C.W., 1985, Handbook of stochastic methods, V3
[5]   STOCHASTIC DYNAMICS OF SUPERVISED LEARNING [J].
HANSEN, LK ;
PATHRIA, R ;
SALAMON, P .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1993, 26 (01) :63-71
[6]   ON FOKKER-PLANCK APPROXIMATIONS OF ONLINE LEARNING-PROCESSES [J].
HESKES, T .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1994, 27 (15) :5145-5160
[7]   LEARNING IN NEURAL NETWORKS WITH LOCAL MINIMA [J].
HESKES, TM ;
SLIJPEN, ETP ;
KAPPEN, B .
PHYSICAL REVIEW A, 1992, 46 (08) :5221-5231
[8]   LEARNING-PROCESSES IN NEURAL NETWORKS [J].
HESKES, TM ;
KAPPEN, B .
PHYSICAL REVIEW A, 1991, 44 (04) :2718-2726
[9]   COOLING SCHEDULES FOR LEARNING IN NEURAL NETWORKS [J].
HESKES, TM ;
SLIJPEN, ETP ;
KAPPEN, B .
PHYSICAL REVIEW E, 1993, 47 (06) :4457-4464
[10]   SELF-ORGANIZED FORMATION OF TOPOLOGICALLY CORRECT FEATURE MAPS [J].
KOHONEN, T .
BIOLOGICAL CYBERNETICS, 1982, 43 (01) :59-69