Reinforcement Learning for Partially Observable Linear Gaussian Systems Using Batch Dynamics of Noisy Observations

被引：2

作者：

Yaghmaie, Farnaz Adib ^{[1
]}

Modares, Hamidreza ^{[2
]}

Gustafsson, Fredrik ^{[1
]}

机构：

[1] Linkoping Univ, Fac Elect Engn, S-58183 Linkoping, Sweden

[2] Michigan State Univ, Coll Engn, E Lansing, MI 48824 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2024年 / 69卷 / 09期

基金：

瑞典研究理事会; 美国国家科学基金会;

关键词：

Costs; History; Noise; Dynamical systems; Noise measurement; Heuristic algorithms; Data models; Linear quadratic Gaussian; partiially observable dynamical systems; reinforcement learning;

D O I：

10.1109/TAC.2024.3385680

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning algorithms are commonly used to control dynamical systems with measurable state variables. If the dynamical system is partially observable, reinforcement learning algorithms are modified to compensate for the effect of partial observability. One common approach is to feed a finite history of input-output data instead of the state variable. In this article, we study and quantify the effect of this approach in linear Gaussian systems with quadratic costs. We coin the concept of L-Extra-Sampled-dynamics to formalize the idea of using a finite history of input-output data instead of state and show that this approach increases the average cost.

引用

页码：6397 / 6404

页数：8

共 50 条

[31] Reinforcement learning for linear exponential quadratic Gaussian problem
Lai, Jing
Xiong, Junlin
SYSTEMS & CONTROL LETTERS, 2024, 185
[32] Hierarchical Deep Reinforcement Learning for Multi-robot Cooperation in Partially Observable Environment
Liang, Zhixuan
Cao, Jiannong
Lin, Wanyu
Chen, Jinlin
Xu, Huafeng
2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 272 - 281
[33] A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models
Asiain, Erick
Clempner, Julio B.
Poznyak, Alexander S.
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (08)
[34] Benefits of Combining Dimensional Attention and Working Memory for Partially Observable Reinforcement Learning Problems
Omatu, Ngozi
Phillips, Joshua L.
ACMSE 2021: PROCEEDINGS OF THE 2021 ACM SOUTHEAST CONFERENCE, 2021, : 209 - 213
[35] Deep Recurrent Reinforcement Learning for Partially Observable User Association in a Vertical Heterogenous Network
Khoshkbari, Hesam
Kaddoum, Georges
IEEE COMMUNICATIONS LETTERS, 2023, 27 (12) : 3235 - 3239
[36] Quantized Control Design for Linear Systems Using Reinforcement Learning
Mehrivash, Hamed
Valadbeigi, Amir Parviz
Shu, Zhan
IFAC PAPERSONLINE, 2023, 56 (02): : 3800 - +
[37] Embedding active learning in batch-to-batch optimization using reinforcement learning
Byun, Ha-Eun
Kim, Boeun
Lee, Jay H.
AUTOMATICA, 2023, 157
[38] A recurrent reinforcement learning strategy for optimal scheduling of partially observable job-shop and flow-shop batch chemical plants under uncertainty
Rangel-Martinez, Daniel
Ricardez-Sandoval, Luis A.
COMPUTERS & CHEMICAL ENGINEERING, 2024, 188
[39] A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
Vengerov, David
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2008, 24 (07): : 687 - 693
[40] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
Osada, H
Fujita, S
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1004 - 1011

← 1 2 3 4 5 →