Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions

被引:3
作者
Chen, Xiong-Hui [1 ]
Luo, Fan-Ming [1 ]
Yu, Yang [1 ]
Li, Qingyang [2 ]
Qin, Zhiwei [2 ]
Shang, Wenjie [3 ]
Ye, Jieping [3 ]
机构
[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China
[2] DiDi Labs, Mountain View, CA 94043 USA
[3] DiDi Chuxing, Beijing 300450, Peoples R China
基金
美国国家科学基金会;
关键词
Adaptation models; Uncertainty; Predictive models; Behavioral sciences; Extrapolation; Trajectory; Reinforcement learning; Adaptable policy learning; meta learning; model-based reinforcement learning; offline reinforcement learning;
D O I
10.1109/TPAMI.2023.3317131
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.
引用
收藏
页码:15260 / 15274
页数:15
相关论文
共 50 条
[41]   Reinforcement Learning Based Overtaking Decision-Making for Highway Autonomous Driving [J].
Li, Xin ;
Xu, Xin ;
Zuo, Lei .
2015 SIXTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2015, :336-342
[42]   Reinforcement Learning based Lane Change Decision-Making with Imaginary Sampling [J].
Li, Dong ;
Zhao, Dongbin ;
Zhang, Qichao .
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, :16-21
[43]   A Decision-Making System for Cotton Irrigation Based on Reinforcement Learning Strategy [J].
Chen, Yi ;
Yu, Zhuo ;
Han, Zhenxiang ;
Sun, Weihong ;
He, Liang .
AGRONOMY-BASEL, 2024, 14 (01)
[44]   Review of Autonomous Driving Decision-Making Research Based on Reinforcement Learning [J].
Jin L. ;
Han G. ;
Xie X. ;
Guo B. ;
Liu G. ;
Zhu W. .
Qiche Gongcheng/Automotive Engineering, 2023, 45 (04) :527-540
[45]   Engaging Multiple Worldviews With Quantitative Decision Support: A Robust Decision-Making Demonstration Using the Lake Model [J].
Lempert, Robert J. ;
Turner, Sara .
RISK ANALYSIS, 2021, 41 (06) :845-865
[46]   Episodic memory governs choices: An RNN-based reinforcement learning model for decision-making task [J].
Zhang, Xiaohan ;
Liu, Lu ;
Long, Guodong ;
Jiang, Jing ;
Liu, Shenquan .
NEURAL NETWORKS, 2021, 134 :1-10
[47]   Model-based offline reinforcement learning framework for optimizing tunnel boring machine operation [J].
Cao, Yupeng ;
Luo, Wei ;
Xue, Yadong ;
Lin, Weiren ;
Zhang, Feng .
UNDERGROUND SPACE, 2024, 19 :47-71
[48]   Decision-Making Models Based on Meta-Reinforcement Learning for Intelligent Vehicles at Urban Intersections [J].
Chen X. ;
Liu J. ;
Wang Z. ;
Han X. ;
Sun Y. ;
Zheng X. .
Journal of Beijing Institute of Technology (English Edition), 2022, 31 (04) :327-339
[49]   Model-Based Reinforcement Learning via Proximal Policy Optimization [J].
Sun, Yuewen ;
Yuan, Xin ;
Liu, Wenzhang ;
Sun, Changyin .
2019 CHINESE AUTOMATION CONGRESS (CAC2019), 2019, :4736-4740
[50]   A Behavioral Decision-Making Model of Learning and Memory for Mobile Robot Triggered by Curiosity [J].
Wang, Dongshu ;
Liu, Qi ;
Gao, Xulin ;
Liu, Lei .
IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2025, 17 (02) :352-365