The Curse of Memory in Stochastic Approximation

被引：0

作者：

Lauand, Caio Kalil ^{[1
]}

Meyn, Sean ^{[1
]}

机构：

[1] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

来源：

2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年

关键词：

Stochastic Approximation; Stochastic Recursive; Algorithms; TD learning; SPECTRAL THEORY; ODE METHOD; CONVERGENCE;

D O I：

10.1109/CDC49753.2023.1038398

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Theory and application of stochastic approximation (SA) has grown within the control systems community since the earliest days of adaptive control. This paper takes a new look at the topic, motivated by recent results establishing remarkable performance of SA with (sufficiently small) constant step-size alpha > 0. If averaging is implemented to obtain the final parameter estimate, then the estimates are asymptotically unbiased with nearly optimal asymptotic covariance. These results have been obtained for random linear SA recursions with i.i.d. coefficients. This paper obtains very different conclusions in the more common case of geometrically ergodic Markovian disturbance: (i) The target bias is identified, even in the case of non-linear SA, and is in general non-zero. The remaining results are established for linear SA recursions: (ii) the bivariate parameter-disturbance process is geometrically ergodic in a topological sense; (iii) the representation for bias has a simpler form in this case, and cannot be expected to be zero if there is multiplicative noise; (iv) the asymptotic covariance of the averaged parameters is within O(alpha) of optimal. The error term is identified, and may be massive if mean dynamics are not well conditioned. The theory is illustrated with application to TD-learning.

引用

页码：7803 / 7809

页数：7

共 19 条

[1]

Bach F., 2013, Advances in neural information processing systems, P773

[2]

Borkar V. S., 2021, Stochastic Approximation: A Dynamical Systems Viewpoint, V2nd

[3] The ODE method for convergence of stochastic approximation and reinforcement learning [J].

Borkar, VS ;

Meyn, SP .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (02) :447-469

[4] A representation theorem for the error of recursive estimators [J].

Gerencsér, L .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2006, 44 (06) :2123-2188

[5]

Huang JY, 2002, LECT NOTES CONTR INF, V280, P205

[6]

Huo D., 2023, P ACM SIGMETRICS, P81

[7] Large deviations asymptotics and the spectral theory of multiplicatively regular Markov processes [J].

Kontoyiannis, I ;

Meyn, SP .

ELECTRONIC JOURNAL OF PROBABILITY, 2005, 10 :61-123

[8]

Lauand C. K., 2023, ARXIV230902944

[9]

Lauand C. K., 2022, ALL C COMM CONTR COM, P1

[10]

Lauand C. K., 2022, ARXIV220706371

← 1 2 →