Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

被引：31

作者：

Li, Yingying ^{[1
]}

Tang, Yujie ^{[1
]}

Zhang, Runyu ^{[1
]}

Li, Na ^{[1
]}

机构：

[1] Harvard Univ, John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2022年 / 67卷 / 12期

基金：

美国国家科学基金会;

关键词：

Distributed reinforcement learning (RL); linear quadratic regulator (LQR); zero-order optimization; MULTIAGENT SYSTEMS;

D O I：

10.1109/TAC.2021.3128592

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article considers a distributed reinforcement learning problem for decentralized linear quadratic (LQ) control with partial state observations and local costs. We propose a zero-order distributed policy optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization, and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance's inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated throughout ZODPO are stabilizing controllers with high probability. Last, we numerically test ZODPO on multizone HVAC systems.

引用

页码：6429 / 6444

页数：16

共 50 条

[1] Multi-agent discrete-time graphical games and reinforcement learning solutions
Abouheaf, Mohammed I.
Lewis, Frank L.
Vamvoudakis, Kyriakos G.
Haesaert, Sofie
Babuska, Robert
[J]. AUTOMATICA, 2014, 50 (12) : 3038 - 3053
[2] Agarwal Alekh, 2021, J. Mach. Learn. Res., V22
[3] Al Alam A, 2011, IEEE DECIS CONTR P, P6894, DOI 10.1109/CDC.2011.6160938
[4] [Anonymous], 2000, P C UNCERTAINTY ARTI
[5] Ariyur K. B., 2003, REAL TIME OPTIMIZATI
[6] Astrom K., 2008, Adaptive control, V2nd
[7] Opportunities for multiagent systems and multiagent reinforcement learning in traffic control
Bazzan, Ana L. C.
[J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2009, 18 (03) : 342 - 375
[8] The complexity of decentralized control of Markov decision processes
Bernstein, DS
Givan, R
Immerman, N
Zilberstein, S
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2002, 27 (04) : 819 - 840
[9] Bu JJ, 2019, Arxiv, DOI arXiv:1907.08921
[10] On the Sample Complexity of the Linear Quadratic Regulator
Dean, Sarah
Mania, Horia
Matni, Nikolai
Recht, Benjamin
Tu, Stephen
[J]. FOUNDATIONS OF COMPUTATIONAL MATHEMATICS, 2020, 20 (04) : 633 - 679

← 1 2 3 4 5 →