Convergence rate of linear two-time-scale stochastic approximation

被引:80
作者
Konda, VR [1 ]
Tsitsiklis, JN [1 ]
机构
[1] MIT, Informat & Decis Syst Lab, Cambridge, MA 02139 USA
关键词
stochastic approximation; two-time-scales;
D O I
10.1214/105051604000000116
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We study the rate of convergence of linear two-time-scale stochastic approximation methods. We consider two-time-scale linear iterations driven by i.i.d. noise, prove some results on their asymptotic covariance and establish asymptotic normality. The well-known result [Polyak, B. T. (1990). Automat. Remote Contr 51 937-946; Ruppert, D. (1988). Technical Report 781, Cornell Univ.] on the optimality of Polyak-Ruppert averaging techniques specialized to linear stochastic approximation is established as a consequence of the general results in this paper.
引用
收藏
页码:796 / 819
页数:24
相关论文
共 17 条
[1]  
BARAS JS, 2000, P 39 IEEE C DEC CONT
[2]  
Benveniste A, 1990, Adaptive algorithms and stochastic approximations
[3]  
BHATNAGAR, 2001, IIE T, V3, P245
[4]   Optimal structured feedback policies for ABR flow control using two-timescale SPSA [J].
Bhatnagar, S ;
Fu, MC ;
Marcus, SI ;
Fard, PJ .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2001, 9 (04) :479-491
[5]   Stochastic approximation with two time scales [J].
Borkar, VS .
SYSTEMS & CONTROL LETTERS, 1997, 29 (05) :291-294
[6]  
Duflo M., 1997, RANDOM ITERATIVE MOD
[7]   APPLICATIONS OF SINGULAR PERTURBATION TECHNIQUES TO CONTROL-PROBLEMS [J].
KOKOTOVIC, PV .
SIAM REVIEW, 1984, 26 (04) :501-550
[8]  
Konda V., 2002, THESIS MIT
[9]   Actor-critic-type learning algorithms for Markov decision processes [J].
Konda, VR ;
Borkar, VS .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1999, 38 (01) :94-123
[10]   STOCHASTIC-APPROXIMATION WITH AVERAGING OF THE ITERATES - OPTIMAL ASYMPTOTIC RATE OF CONVERGENCE FOR GENERAL PROCESSES [J].
KUSHNER, HJ ;
YANG, JC .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1993, 31 (04) :1045-1062