The Generalization Ability of Online Algorithms for Dependent Data

被引:54
作者
Agarwal, Alekh [1 ]
Duchi, John C. [1 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
关键词
Dependent observations; generalization bounds; linear prediction; online learning; statistical learning theory; CONVERGENCE; STABILITY; TRACKING; RATES;
D O I
10.1109/TIT.2012.2212414
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the generalization performance of online learning algorithms trained on samples coming from a dependent source of data. We show that the generalization error of any stable online algorithm concentrates around its regret-an easily computable statistic of the online performance of the algorithm-when the underlying ergodic process is beta- or phi-mixing. We show high-probability error bounds assuming the loss function is convex, and we also establish sharp convergence rates and deviation bounds for strongly convex losses and several linear prediction problems such as linear and logistic regression, least-squares SVM, and boosting on dependent data. In addition, our results have straightforward applications to stochastic optimization with dependent data, and our analysis requires only martingale convergence arguments; we need not rely on more powerful statistical tools such as empirical process theory.
引用
收藏
页码:573 / 587
页数:15
相关论文
共 31 条
[1]  
[Anonymous], 1986, Probability and Measure
[2]  
[Anonymous], 2009, Adv. Neu-ral Inf. Process. Syst.
[3]  
[Anonymous], 2009, MARKOV CHAINS STOCHA
[4]  
Azuma Kazuoki, 1967, Tohoku Mathematical Journal, V19, P357, DOI [10.2748/tmj/1178243286, DOI 10.2748/TMJ/1178243286]
[5]   Local Rademacher complexities [J].
Bartlett, PL ;
Bousquet, O ;
Mendelson, S .
ANNALS OF STATISTICS, 2005, 33 (04) :1497-1537
[6]   Mirror descent and nonlinear projected subgradient methods for convex optimization [J].
Beck, A ;
Teboulle, M .
OPERATIONS RESEARCH LETTERS, 2003, 31 (03) :167-175
[7]   Stability and generalization [J].
Bousquet, O ;
Elisseeff, A .
JOURNAL OF MACHINE LEARNING RESEARCH, 2002, 2 (03) :499-526
[8]   Basic Properties of Strong Mixing Conditions. A Survey and Some Open Questions [J].
Bradley, Richard C. .
PROBABILITY SURVEYS, 2005, 2 :107-144
[9]   On the generalization ability of on-line learning algorithms [J].
Cesa-Bianchi, N ;
Conconi, A ;
Gentile, C .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2004, 50 (09) :2050-2057
[10]  
Cesa-Bianchi N., 2006, PREDICTION LEARNING