Reinforcement Learning for Partially Observable Linear Gaussian Systems Using Batch Dynamics of Noisy Observations

被引:2
|
作者
Yaghmaie, Farnaz Adib [1 ]
Modares, Hamidreza [2 ]
Gustafsson, Fredrik [1 ]
机构
[1] Linkoping Univ, Fac Elect Engn, S-58183 Linkoping, Sweden
[2] Michigan State Univ, Coll Engn, E Lansing, MI 48824 USA
基金
瑞典研究理事会; 美国国家科学基金会;
关键词
Costs; History; Noise; Dynamical systems; Noise measurement; Heuristic algorithms; Data models; Linear quadratic Gaussian; partiially observable dynamical systems; reinforcement learning;
D O I
10.1109/TAC.2024.3385680
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Reinforcement learning algorithms are commonly used to control dynamical systems with measurable state variables. If the dynamical system is partially observable, reinforcement learning algorithms are modified to compensate for the effect of partial observability. One common approach is to feed a finite history of input-output data instead of the state variable. In this article, we study and quantify the effect of this approach in linear Gaussian systems with quadratic costs. We coin the concept of L-Extra-Sampled-dynamics to formalize the idea of using a finite history of input-output data instead of state and show that this approach increases the average cost.
引用
收藏
页码:6397 / 6404
页数:8
相关论文
共 50 条
  • [31] Reinforcement learning for linear exponential quadratic Gaussian problem
    Lai, Jing
    Xiong, Junlin
    SYSTEMS & CONTROL LETTERS, 2024, 185
  • [32] Hierarchical Deep Reinforcement Learning for Multi-robot Cooperation in Partially Observable Environment
    Liang, Zhixuan
    Cao, Jiannong
    Lin, Wanyu
    Chen, Jinlin
    Xu, Huafeng
    2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 272 - 281
  • [33] A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models
    Asiain, Erick
    Clempner, Julio B.
    Poznyak, Alexander S.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (08)
  • [34] Benefits of Combining Dimensional Attention and Working Memory for Partially Observable Reinforcement Learning Problems
    Omatu, Ngozi
    Phillips, Joshua L.
    ACMSE 2021: PROCEEDINGS OF THE 2021 ACM SOUTHEAST CONFERENCE, 2021, : 209 - 213
  • [35] Deep Recurrent Reinforcement Learning for Partially Observable User Association in a Vertical Heterogenous Network
    Khoshkbari, Hesam
    Kaddoum, Georges
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (12) : 3235 - 3239
  • [36] Quantized Control Design for Linear Systems Using Reinforcement Learning
    Mehrivash, Hamed
    Valadbeigi, Amir Parviz
    Shu, Zhan
    IFAC PAPERSONLINE, 2023, 56 (02): : 3800 - +
  • [37] Embedding active learning in batch-to-batch optimization using reinforcement learning
    Byun, Ha-Eun
    Kim, Boeun
    Lee, Jay H.
    AUTOMATICA, 2023, 157
  • [38] A recurrent reinforcement learning strategy for optimal scheduling of partially observable job-shop and flow-shop batch chemical plants under uncertainty
    Rangel-Martinez, Daniel
    Ricardez-Sandoval, Luis A.
    COMPUTERS & CHEMICAL ENGINEERING, 2024, 188
  • [39] A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
    Vengerov, David
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2008, 24 (07): : 687 - 693
  • [40] CHQ: A multi-agent reinforcement learning scheme for partially observable Markov decision processes
    Osada, H
    Fujita, S
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2005, E88D (05): : 1004 - 1011