ONLINE LEARNING WITH MARKOV SAMPLING

被引:124
作者
Smale, Steve [1 ]
Zhou, Ding-Xuan [2 ]
机构
[1] Toyota Technol Inst Chicago, Chicago, IL 60637 USA
[2] City Univ Hong Kong, Dept Math, Kowloon, Hong Kong, Peoples R China
基金
美国国家科学基金会;
关键词
Learning theory; online learning; Markov sampling; reproducing kernel Hilbert space; CLASSIFICATION; ERROR;
D O I
10.1142/S0219530509001293
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
This paper attempts to give an extension of learning theory to a setting where the assumption of i.i.d. data is weakened by keeping the independence but abandoning the identical restriction. We hypothesize that a sequence of examples (xt, yt) in X x Y for t = 1, 2, 3,... is drawn from a probability distribution rho(t) on X x Y. The marginal probabilities on X are supposed to converge to a limit probability on X. Two main examples for this time process are discussed. The first is a stochastic one which in the special case of a finite space X is defined by a stochastic matrix and more generally by a stochastic kernel. The second is determined by an underlying discrete dynamical system on the space X. Our theoretical treatment requires that this dynamics be hyperbolic (or "Axiom A") which still permits a class of chaotic systems (with Sinai-Ruelle-Bowen attractors). Even in the case of a limit Dirac point probability, one needs the measure theory to be defined using Holder spaces. Many implications of our work remain unexplored. These include, for example, the relation to Hidden Markov Models, as well as Markov Chain Monte Carlo methods. It seems reasonable that further work should consider the push forward of the process from X x Y by some kind of observable function to a data space.
引用
收藏
页码:87 / 113
页数:27
相关论文
共 32 条
[1]  
[Anonymous], 1999, STUDIES ADV MATH
[2]   THEORY OF REPRODUCING KERNELS [J].
ARONSZAJN, N .
TRANSACTIONS OF THE AMERICAN MATHEMATICAL SOCIETY, 1950, 68 (MAY) :337-404
[3]   Worst-case quadratic loss bounds for prediction using linear functions and gradient descent [J].
CesaBianchi, N ;
Long, PM ;
Warmuth, MK .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1996, 7 (03) :604-619
[4]  
Cucker F, 2007, C MO AP C M, P1, DOI 10.1017/CBO9780511618796
[5]  
Cucker F, 2002, B AM MATH SOC, V39, P1
[6]   Model selection for regularized least-squares algorithm in learning theory [J].
De Vito, E ;
Caponnetto, A ;
Rosasco, L .
FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2005, 5 (01) :59-85
[7]  
Devroye L., 1997, A Probabilistic Theory of Pattern Recognition. Stochastic Modelling and Applied Probability
[8]   Regularization networks and support vector machines [J].
Evgeniou, T ;
Pontil, M ;
Poggio, T .
ADVANCES IN COMPUTATIONAL MATHEMATICS, 2000, 13 (01) :1-50
[9]  
FELLER W., 1966, An Introduction to Probability Theory and its Applications, VII
[10]   Online learning with kernels [J].
Kivinen, J ;
Smola, AJ ;
Williamson, RC .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2004, 52 (08) :2165-2176