Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions

被引：0

作者：

Chen, Xiong-Hui ^{[1
]}

Luo, Fan-Ming ^{[1
]}

Yu, Yang ^{[1
]}

Li, Qingyang ^{[2
]}

Qin, Zhiwei ^{[2
]}

Shang, Wenjie ^{[3
]}

Ye, Jieping ^{[3
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China

[2] DiDi Labs, Mountain View, CA 94043 USA

[3] DiDi Chuxing, Beijing 300450, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 12期

基金：

美国国家科学基金会;

关键词：

Adaptation models; Uncertainty; Predictive models; Behavioral sciences; Extrapolation; Trajectory; Reinforcement learning; Adaptable policy learning; meta learning; model-based reinforcement learning; offline reinforcement learning;

D O I：

10.1109/TPAMI.2023.3317131

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.

引用

页码：15260 / 15274

页数：15

共 50 条

[1] RelTrans: An Enhancing Offline Reinforcement Learning Model for the Complex Hand Gesture Decision-Making Task
Chen, Xiangwei
Zeng, Zhixia
Xiao, Ruliang
Rida, Imad
Zhang, Shi
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 3762 - 3769
[2] Online model-based reinforcement learning for decision-making in long distance routes
Alcaraz, Juan J.
Losilla, Fernando
Caballero-Arnaldos, Luis
TRANSPORTATION RESEARCH PART E-LOGISTICS AND TRANSPORTATION REVIEW, 2022, 164
[3] Uncertainty-Aware Model-Based Offline Reinforcement Learning for Automated Driving
Diehl, Christopher
Sievernich, Timo Sebastian
Kruger, Martin
Hoffmann, Frank
Bertram, Torsten
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (02) : 1167 - 1174
[4] Model-Based Offline Reinforcement Learning for Autonomous Delivery of Guidewire
Li, Hao
Zhou, Xiao-Hu
Xie, Xiao-Liang
Liu, Shi-Qi
Feng, Zhen-Qiu
Gui, Mei-Jiang
Xiang, Tian-Yu
Huang, De-Xing
Hou, Zeng-Guang
IEEE TRANSACTIONS ON MEDICAL ROBOTICS AND BIONICS, 2024, 6 (03): : 1054 - 1062
[5] BiES: Adaptive Policy Optimization for Model-Based Offline Reinforcement Learning
Yang, Yijun
Jiang, Jing
Wang, Zhuowei
Duan, Qiqi
Shi, Yuhui
AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13151 : 570 - 581
[6] Offline model-based reinforcement learning with causal structured world models
Zhu, Zhengmao
Tian, Honglong
Chen, Xionghui
Zhang, Kun
Yu, Yang
FRONTIERS OF COMPUTER SCIENCE, 2025, 19 (04)
[7] An Analysis of Offline Model-Based Learning with Action Noise
Li, Haoya
Gangwani, Tanmay
Ying, Lexing
JOURNAL OF SCIENTIFIC COMPUTING, 2025, 103 (02)
[8] Offline Model-Based Reinforcement Learning for Tokamak Control
Char, Ian
Abbate, Joseph
Bardoczi, Laszlo
Boyer, Mark D.
Chung, Youngseog
Conlin, Rory
Erickson, Keith
Mehta, Viraj
Richner, Nathan
Kolemen, Egemen
Schneider, Jeff
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[9] Model-Based Offline Policy Optimization with Distribution Correcting Regularization
Shen, Jian
Chen, Mingcheng
Zhang, Zhicheng
Yang, Zhengyu
Zhang, Weinan
Yu, Yong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 174 - 189
[10] Model-based offline reinforcement learning for sustainable fishery management
Ju, Jun
Kurniawati, Hanna
Kroese, Dirk
Ye, Nan
EXPERT SYSTEMS, 2025, 42 (01)

← 1 2 3 4 5 →