State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning With Rewards

被引:4
|
作者
Calvo-Fullana, Miguel [1 ]
Paternain, Santiago [2 ]
Chamon, Luiz F. O. [3 ]
Ribeiro, Alejandro [4 ]
机构
[1] Univ Pompeu Fabra, Dept Informat & Commun Technol, Barcelona 08002, Spain
[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA
[3] Univ Stuttgart, Excellence Cluster Simulat Technol, D-70174 Stuttgart, Germany
[4] Univ Penn, Dept Elect & Syst Engn, Philadelphia, PA 19104 USA
关键词
Reinforcement learning; Trajectory; Task analysis; Monitoring; Optimization; Convergence; Systematics; Autonomous systems; optimization; reinforcement learning; ACTOR-CRITIC ALGORITHM; APPROXIMATION;
D O I
10.1109/TAC.2023.3319070
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. In this class of problems, we show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. Hence, there exist constrained reinforcement learning problems for which neither regularized nor classical primal-dual methods yield optimal policies. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods as the portion of the dynamics that drives the multiplier evolution. This approach provides a systematic state augmentation procedure that is guaranteed to solve reinforcement learning problems with constraints. Thus, as we illustrate by an example, while previous methods can fail at finding optimal policies, running the dual dynamics while executing the augmented policy yields an algorithm that provably samples actions from the optimal policy.
引用
收藏
页码:4275 / 4290
页数:16
相关论文
共 50 条
  • [1] Constrained reinforcement learning from intrinsic and extrinsic rewards
    Uchibe, Eiji
    Doya, Kenji
    2007 IEEE 6TH INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING, 2007, : 45 - +
  • [2] Demonstration and offset augmented meta reinforcement learning with sparse rewards
    Li, Haorui
    Liang, Jiaqi
    Wang, Xiaoxuan
    Jiang, Chengzhi
    Li, Linjing
    Zeng, Daniel
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [3] Finding intrinsic rewards by embodied evolution and constrained reinforcement learning
    Uchibe, Eiji
    Doya, Kenji
    NEURAL NETWORKS, 2008, 21 (10) : 1447 - 1455
  • [4] Augmented Lagrangian Method for Instantaneously Constrained Reinforcement Learning Problems
    Li, Jingqi
    Fridovich-Keil, David
    Sojoudi, Somayeh
    Tomlin, Claire J.
    2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 2982 - 2989
  • [5] Learning Intrinsic Symbolic Rewards in Reinforcement Learning
    Sheikh, Hassam Ullah
    Khadka, Shauharda
    Miret, Santiago
    Majumdar, Somdeb
    Phielipp, Mariano
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [6] Online learning of shaping rewards in reinforcement learning
    Grzes, Marek
    Kudenko, Daniel
    NEURAL NETWORKS, 2010, 23 (04) : 541 - 550
  • [7] Reinforcement Learning with Perturbed Rewards
    Wang, Jingkang
    Liu, Yang
    Li, Bo
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 6202 - 6209
  • [8] Discovering and Removing Exogenous State Variables and Rewards for Reinforcement Learning
    Dietterich, Thomas
    Trimponias, George
    Chen, Zhitang
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [9] Finding exploratory rewards by embodied evolution and constrained reinforcement learning in the Cyber Rodents
    Uchibe, Eiji
    Doya, Kenji
    NEURAL INFORMATION PROCESSING, PART II, 2008, 4985 : 167 - 176
  • [10] Reinforcement Learning With Temporal Logic Rewards
    Li, Xiao
    Vasile, Cristian-Ioan
    Belta, Calin
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 3834 - 3839