Offline reinforcement learning with representations for actions

被引:4
|
作者
Lou, Xingzhou [1 ,2 ]
Yin, Qiyue [1 ]
Zhang, Junge [1 ]
Yu, Chao [3 ]
He, Zhaofeng [4 ]
Cheng, Nengjie [5 ]
Huang, Kaiqi [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[3] Sun Yat Sen Univ, Guangzhou, Peoples R China
[4] Beijing Univ Posts & Telecommun, Beijing, Peoples R China
[5] Nanchang Univ, Nanchang, Peoples R China
基金
中国国家自然科学基金;
关键词
Offline reinforcement learning; Action embedding;
D O I
10.1016/j.ins.2022.08.019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Prevailing offline reinforcement learning (RL) methods limit the policy within the area sup-ported by the offline dataset to avoid the distributional shift problem. But potential high -reward actions, which are out of the distribution of the dataset, are neglected in these meth-ods. To address such issue, we propose a new method, which generalizes from the offline dataset to out-of-distribution (OOD) actions. Specifically, we design a novel action embed-ding model to help infer the effect of actions. As a result, our value function reaches a better generalization over the action space, and further alleviate the distributional shift caused by overestimation of OOD actions. Theoretically, we give an information-theoretic explanation on the improvement of the value function's generalization over the action space. Experiments on D4RL demonstrate that our model improves the performance compared to previous offline RL methods, especially when the experience in the offline dataset is good. We conduct further study and validate that the value function's generalization on OOD actions is improved, which reinforces the effectiveness of our proposed action embedding model. (c) 2022 Published by Elsevier Inc.
引用
收藏
页码:746 / 758
页数:13
相关论文
共 50 条
  • [1] Pyramid Representations of the Set of Actions in Reinforcement Learning
    Iglesias, R.
    Alvarez-Santos, V.
    Rodriguez, M. A.
    Santos-Saavedra, D.
    Regueiro, C. V.
    Pardo, X. M.
    BIOINSPIRED COMPUTATION IN ARTIFICIAL SYSTEMS, PT II, 2015, 9108 : 203 - 212
  • [2] Learning Pseudometric-based Action Representations for Offline Reinforcement Learning
    Gu, Pengjie
    Zhao, Mengchen
    Chen, Chen
    Li, Dong
    Hao, Jianye
    An, Bo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [3] Robust Task Representations for Offline Meta-Reinforcement Learning via Contrastive Learning
    Yuan, Haoqi
    Lu, Zongqing
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [4] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Wang, Changhong
    Yu, Xudong
    Bai, Chenjia
    Zhang, Qiaosheng
    Wang, Zhen
    SCIENCE CHINA-INFORMATION SCIENCES, 2024, 67 (07)
  • [5] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Changhong WANG
    Xudong YU
    Chenjia BAI
    Qiaosheng ZHANG
    Zhen WANG
    Science China(Information Sciences), 2024, (07) : 240 - 255
  • [6] Ensemble successor representations for task generalization in offline-to-online reinforcement learning
    Changhong WANG
    Xudong YU
    Chenjia BAI
    Qiaosheng ZHANG
    Zhen WANG
    Science China(Information Sciences), 2024, 67 (07) : 240 - 255
  • [7] Understanding and Addressing the Pitfalls of Bisimulation-based Representations in Offline Reinforcement Learning
    Zang, Hongyu
    Li, Xin
    Zhang, Leiji
    Liu, Yang
    Sun, Baigui
    Islam, Riashat
    des Combes, Remi Tachet
    Laroche, Romain
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [8] Offline Reinforcement Learning with Pseudometric Learning
    Dadashi, Robert
    Rezaeifar, Shideh
    Vieillard, Nino
    Hussenot, Leonard
    Pietquin, Olivier
    Geist, Matthieu
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] Benchmarking Offline Reinforcement Learning
    Tittaferrante, Andrew
    Yassine, Abdulsalam
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 259 - 263
  • [10] Federated Offline Reinforcement Learning
    Zhou, Doudou
    Zhang, Yufeng
    Sonabend-W, Aaron
    Wang, Zhaoran
    Lu, Junwei
    Cai, Tianxi
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (548) : 3152 - 3163