Analysis of two gradient-based algorithms for on-line regression

被引：33

作者：

Cesa-Bianchi, N ^{[1
]}

机构：

[1] Univ Milan, DSI, I-20135 Milan, Italy

来源：

JOURNAL OF COMPUTER AND SYSTEM SCIENCES | 1999年 / 59卷 / 03期

关键词：

D O I：

10.1006/jcss.1999.1635

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we present a new analysis of two algorithms, Gradient Descent and Exponentiated Gradient, for solving regression problems in the on-line framework. Both these algorithms compute a prediction that depends linearly on the current instance, and then update the coefficients of this linear combination according to the gradient of the loss function. However, the two algorithms have distinctive ways of using the gradient information for updating the coefficients. For each algorithm, we show general regression bounds for any convex loss function. Furthermore, we show special bounds for the absolute and the square loss functions, thus extending previous results by Kivinen and Warmuth. In the nonlinear regression case, we show general bounds for pairs of transfer and loss functions satisfying a certain condition. We apply this result to the Hellinger loss and the entropic loss in case of logistic regression (similar results, but only for the entropic loss, were also obtained by Helmbold et al. using a different analysis.) Finally, we describe the connection between our approach and a general family of gradient-based algorithms proposed by Warmuth et nl. in recent works. (C) 1999 Academic Press.

引用

页码：392 / 411

页数：20

共 23 条

[1]

BYLANDER T, 1997, P 10 ANN C COMP LEAR, P485

[2] How to use expert advice [J].

CesaBianchi, N ;

Freund, Y ;

Haussler, D ;

Helmbold, DP ;

Schapire, RE ;

Warmuth, MK .

JOURNAL OF THE ACM, 1997, 44 (03) :427-485

[3] Worst-case quadratic loss bounds for prediction using linear functions and gradient descent [J].

CesaBianchi, N ;

Long, PM ;

Warmuth, MK .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (03) :604-619

[4]

Cover T. M., 2005, ELEM INF THEORY, DOI 10.1002/047174882X

[5]

Devroye L., 1996, A probabilistic theory of pattern recognition

[6] UNIVERSAL PREDICTION OF INDIVIDUAL SEQUENCES [J].

FEDER, M ;

MERHAV, N ;

GUTMAN, M .

IEEE TRANSACTIONS ON INFORMATION THEORY, 1992, 38 (04) :1258-1270

[7]

Freund Y., 1996, Proceedings of the Ninth Annual Conference on Computational Learning Theory, P89, DOI 10.1145/238061.238072

[8]

Grove A. J., 1997, Proceedings of the Tenth Annual Conference on Computational Learning Theory, P171, DOI 10.1145/267460.267493

[9] DECISION THEORETIC GENERALIZATIONS OF THE PAC MODEL FOR NEURAL NET AND OTHER LEARNING APPLICATIONS [J].

HAUSSLER, D .

INFORMATION AND COMPUTATION, 1992, 100 (01) :78-150

[10]

HELMBOLD DP, 1997, ADV NEURAL INFORMATI, V8, P309

← 1 2 3 →