DCAC: Reducing Unnecessary Conservatism in Offline-to-online Reinforcement Learning

被引:1
|
作者
Chen, Dongxiang [1 ]
Wen, Ying [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
来源
2023 5TH INTERNATIONAL CONFERENCE ON DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2023 | 2023年
关键词
Reinforcement Learning; Offline-to-online; Finetune;
D O I
10.1145/3627676.3627677
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advancements in offline reinforcement learning (RL) have facilitated the training of powerful agents using fixed datasets exclusively. Despite this, the quality of a dataset plays a critical role in determining an agent's performance, and high-quality datasets are often scarce. This scarcity necessitates the enhancement of agents through subsequent environmental interactions. Particularly, the state-action distribution shift may exert a potentially detrimental effect on well-initialized policies, thus impeding the straightforward application of off-policy RL algorithms to policies trained offline. Predominant offline-to-online RL approaches are typically founded on conservatism, a characteristic that may inadvertently confine the asymptotic performance. In response, we propose a method referred to as Dynamically Constrained Actor-Critic (DCAC), grounded in the mathematical form of dynamically constrained policy optimization. This innovative method enables judicious adjustments to the constraints on policy optimization in accordance with a specified rule, thus stabilizing the initial online learning stage and reducing undue conservatism that restricts asymptotic performance. Through comprehensive experimentation across diverse locomotion tasks, we have ascertained that our method successfully improves the policies trained offline with various datasets via subsequent online environmental interactions. The empirical results substantiate that our method mitigates the harmful effects of distribution shift and consistently attains superior asymptotic performance in comparison to prior works.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Byzantine-Robust Online and Offline Distributed Reinforcement Learning
    Chen, Yiding
    Zhang, Xuezhou
    Zhang, Kaiqing
    Wang, Mengdi
    Zhu, Xiaojin
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [32] Reducing reinforcement learning to KWIK online regression
    Li, Lihong
    Littman, Michael L.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2010, 58 (3-4) : 217 - 237
  • [33] Reducing reinforcement learning to KWIK online regression
    Lihong Li
    Michael L. Littman
    Annals of Mathematics and Artificial Intelligence, 2010, 58 : 217 - 237
  • [34] COMPLEMENTARITY AND CANNIBALIZATION OF OFFLINE-TO-ONLINE TARGETING: A FIELD EXPERIMENT ON OMNICHANNEL COMMERCE
    Luo, Xueming
    Zhang, Yuchi
    Zeng, Fue
    Qu, Zhe
    MIS QUARTERLY, 2020, 44 (02) : 957 - 982
  • [35] Offline-to-Online Learning Enabled Robust Control for Uncertain Robotic Systems Pursuing Constraint-Following
    Zheng, Runze
    Chen, Tianxiang
    Zhang, Xinglong
    Zhang, Zheshuo
    Jing, Xingjian
    Yin, Hui
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2024,
  • [36] Learning on the Job: Self-Rewarding Offline-to-Online Finetuning for Industrial Insertion of Novel Connectors from Vision
    Nair, Ashvin
    Zhu, Brian
    Narayanan, Gokul
    Solowjow, Eugen
    Levine, Sergey
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 7154 - 7161
  • [37] Online Tuning for Offline Decentralized Multi-Agent Reinforcement Learning
    Jiang, Jiechuan
    Lu, Zongqing
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8050 - +
  • [38] Offline Meta-Reinforcement Learning with Online Self-Supervision
    Pong, Vitchyr H.
    Nair, Ashvin
    Smith, Laura
    Huang, Catherine
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [39] A maintenance planning framework using online and offline deep reinforcement learning
    Bukhsh, Zaharah A.
    Molegraaf, Hajo
    Jansen, Nils
    NEURAL COMPUTING & APPLICATIONS, 2023,
  • [40] Hybrid Offline/Online Optimization for Energy Management via Reinforcement Learning
    Silvestri, Mattia
    De Filippo, Allegra
    Ruggeri, Federico
    Lombardi, Michele
    INTEGRATION OF CONSTRAINT PROGRAMMING, ARTIFICIAL INTELLIGENCE, AND OPERATIONS RESEARCH, CPAIOR 2022, 2022, 13292 : 358 - 373