Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions

被引:0
作者
Chen, Xiong-Hui [1 ]
Luo, Fan-Ming [1 ]
Yu, Yang [1 ]
Li, Qingyang [2 ]
Qin, Zhiwei [2 ]
Shang, Wenjie [3 ]
Ye, Jieping [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] DiDi Labs, Mountain View, CA 94043 USA
[3] DiDi Chuxing, Beijing 300450, Peoples R China
基金
美国国家科学基金会;
关键词
Adaptation models; Uncertainty; Predictive models; Behavioral sciences; Extrapolation; Trajectory; Reinforcement learning; Adaptable policy learning; meta learning; model-based reinforcement learning; offline reinforcement learning;
D O I
10.1109/TPAMI.2023.3317131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
引用
收藏
页码:15260 / 15274
页数:15
相关论文
共 50 条
  • [1] RelTrans: An Enhancing Offline Reinforcement Learning Model for the Complex Hand Gesture Decision-Making Task
    Chen, Xiangwei
    Zeng, Zhixia
    Xiao, Ruliang
    Rida, Imad
    Zhang, Shi
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 3762 - 3769
  • [2] Online model-based reinforcement learning for decision-making in long distance routes
    Alcaraz, Juan J.
    Losilla, Fernando
    Caballero-Arnaldos, Luis
    TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW, 2022, 164
  • [3] Uncertainty-Aware Model-Based Offline Reinforcement Learning for Automated Driving
    Diehl, Christopher
    Sievernich, Timo Sebastian
    Kruger, Martin
    Hoffmann, Frank
    Bertram, Torsten
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (02) : 1167 - 1174
  • [4] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
    Li, Hao
    Zhou, Xiao-Hu
    Xie, Xiao-Liang
    Liu, Shi-Qi
    Feng, Zhen-Qiu
    Gui, Mei-Jiang
    Xiang, Tian-Yu
    Huang, De-Xing
    Hou, Zeng-Guang
    IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1054 - 1062
  • [5] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
    Yang, Yijun
    Jiang, Jing
    Wang, Zhuowei
    Duan, Qiqi
    Shi, Yuhui
    AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581
  • [6] Offline model-based reinforcement learning with causal structured world models
    Zhu, Zhengmao
    Tian, Honglong
    Chen, Xionghui
    Zhang, Kun
    Yu, Yang
    FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (04)
  • [7] An Analysis of Offline Model-Based Learning with Action Noise
    Li, Haoya
    Gangwani, Tanmay
    Ying, Lexing
    JOURNAL OF SCIENTIFIC COMPUTING, 2025, 103 (02)
  • [8] Offline Model-Based Reinforcement Learning for Tokamak Control
    Char, Ian
    Abbate, Joseph
    Bardoczi, Laszlo
    Boyer, Mark D.
    Chung, Youngseog
    Conlin, Rory
    Erickson, Keith
    Mehta, Viraj
    Richner, Nathan
    Kolemen, Egemen
    Schneider, Jeff
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [9] Model-Based Offline Policy Optimization with Distribution Correcting Regularization
    Shen, Jian
    Chen, Mingcheng
    Zhang, Zhicheng
    Yang, Zhengyu
    Zhang, Weinan
    Yu, Yong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 174 - 189
  • [10] Model-based offline reinforcement learning for sustainable fishery management
    Ju, Jun
    Kurniawati, Hanna
    Kroese, Dirk
    Ye, Nan
    EXPERT SYSTEMS, 2025, 42 (01)