STATISTICAL-THEORY OF LEARNING-CURVES UNDER ENTROPIC LOSS CRITERION

被引:88
作者
AMARI, S
MURATA, N
机构
关键词
D O I
10.1162/neco.1993.5.1.140
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The present paper elucidates a universal property of learning curves, which shows how the generalization error, training error, and the complexity of the underlying stochastic machine are related and how the behavior of a stochastic machine is improved as the number of training examples increases. The error is measured by the entropic loss. It is proved that the generalization error converges to H0, the entropy of the conditional distribution of the true machine, as H0 + m*/(2t), while the training error converges as H0 - m*/(2t), where t is the number of examples and m* shows the complexity of the network. When the model is faithful, implying that the true machine is in the model, m* is reduced to m, the number of modifiable parameters. This is a universal law because it holds for any regular machine irrespective of its structure under the maximum likelihood estimator. Similar relations are obtained for the Bayes and Gibbs learning algorithms. These learning curves show the relation among the accuracy of learning, the complexity of a model, and the number of training examples.
引用
收藏
页码:140 / 153
页数:14
相关论文
共 25 条
  • [1] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [2] 4 TYPES OF LEARNING-CURVES
    AMARI, S
    FUJITA, N
    SHINOMOTO, S
    [J]. NEURAL COMPUTATION, 1992, 4 (04) : 605 - 618
  • [3] A THEORY OF ADAPTIVE PATTERN CLASSIFIERS
    AMARI, S
    [J]. IEEE TRANSACTIONS ON ELECTRONIC COMPUTERS, 1967, EC16 (03): : 299 - +
  • [4] DUALISTIC GEOMETRY OF THE MANIFOLD OF HIGHER-ORDER NEURONS
    AMARI, S
    [J]. NEURAL NETWORKS, 1991, 4 (04) : 443 - 451
  • [5] AMARI S, 1992, METR9203 U TOK
  • [6] AMARI S, 1985, SPRINGER LECTURE NOT, V28
  • [7] What Size Net Gives Valid Generalization?
    Baum, Eric B.
    Haussler, David
    [J]. NEURAL COMPUTATION, 1989, 1 (01) : 151 - 160
  • [8] Gyorgyi G., 1990, NEURAL NETWORKS SPIN, P3, DOI [10.1142/0938, DOI 10.1142/0938]
  • [9] LEARNING FROM EXAMPLES IN A SINGLE-LAYER NEURAL NETWORK
    HANSEL, D
    SOMPOLINSKY, H
    [J]. EUROPHYSICS LETTERS, 1990, 11 (07): : 687 - 692
  • [10] LEARNING-PROCESSES IN NEURAL NETWORKS
    HESKES, TM
    KAPPEN, B
    [J]. PHYSICAL REVIEW A, 1991, 44 (04): : 2718 - 2726