Relative loss bounds for single neurons

被引:39
作者
Helmbold, DP [1 ]
Kivinen, J
Warmuth, MK
机构
[1] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
[2] Univ Helsinki, Dept Comp Sci, FIN-00014 Helsinki, Finland
来源
IEEE TRANSACTIONS ON NEURAL NETWORKS | 1999年 / 10卷 / 06期
基金
美国国家科学基金会;
关键词
exponentiated gradient algorithm; generalized linear regression; matching loss function; relative loss bounds; sigmoid transfer function;
D O I
10.1109/72.809075
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We analyze and compare the well-known gradient descent algorithm and the more recent exponentiated gradient algorithm for training a single neuron with an arbitrary transfer function. Both algorithms are easily generalized to larger neural networks, and the generalization of gradient descent is the standard backpropagation algorithm. In this paper we prove worst-case loss bounds for both algorithms in the single neuron case. Since local minima make it difficult to prove worst-case bounds for gradient-based algorithms, we must use a loss function that prevents the formation of spurious local minima. We define such a matching loss function for any strictly increasing differentiable transfer function and prove worst-case loss bounds for any such transfer function and its corresponding matching loss. For example, the matching loss for the identity function is the square loss and the matching loss for the logistic transfer function is the entropic loss. The different forms of the two algorithms' bounds indicates that exponentiated gradient outperforms gradient descent when the inputs contain a large number of irrelevant components. Simulations on synthetic data confirm these analytical results.
引用
收藏
页码:1291 / 1304
页数:14
相关论文
共 22 条
[1]  
Auer P, 1996, ADV NEUR IN, V8, P316
[2]  
Bregman LM, 1967, USSR Computational Mathematics and Mathematical Physics, V7, P200
[3]   GEOMETRICAL INTERPRETATION OF THE BACKPROPAGATION ALGORITHM FOR THE PERCEPTRON [J].
BUDINICH, M ;
MILOTTI, E .
PHYSICA A, 1992, 185 (1-4) :369-377
[4]   SOME NOTES ON PERCEPTRON LEARNING [J].
BUDINICH, M .
JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 1993, 26 (17) :4237-4247
[5]  
Cesa-Bianchi N., 1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory, P163, DOI 10.1145/267460.267492
[6]   How to use expert advice [J].
CesaBianchi, N ;
Freund, Y ;
Haussler, D ;
Helmbold, DP ;
Schapire, RE ;
Warmuth, MK .
JOURNAL OF THE ACM, 1997, 44 (03) :427-485
[7]   Worst-case quadratic loss bounds for prediction using linear functions and gradient descent [J].
CesaBianchi, N ;
Long, PM ;
Warmuth, MK .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (03) :604-619
[9]  
Gentile C, 1999, ADV NEUR IN, V11, P225
[10]  
Grove A. J., 1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory, P171, DOI 10.1145/267460.267493