Offline Model-Based Adaptable Policy Learning for Decision-Making in Out-of-Support Regions

被引：3

作者：

Chen, Xiong-Hui ^{[1
]}

Luo, Fan-Ming ^{[1
]}

Yu, Yang ^{[1
]}

Li, Qingyang ^{[2
]}

Qin, Zhiwei ^{[2
]}

Shang, Wenjie ^{[3
]}

Ye, Jieping ^{[3
]}

机构：

[1] Nanjing Univ, Natl Key Lab Novel Software Technol, Nanjing 210023, Jiangsu, Peoples R China

[2] DiDi Labs, Mountain View, CA 94043 USA

[3] DiDi Chuxing, Beijing 300450, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 12期

基金：

美国国家科学基金会;

关键词：

Adaptation models; Uncertainty; Predictive models; Behavioral sciences; Extrapolation; Trajectory; Reinforcement learning; Adaptable policy learning; meta learning; model-based reinforcement learning; offline reinforcement learning;

D O I：

10.1109/TPAMI.2023.3317131

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In reinforcement learning, a promising direction to avoid online trial-and-error costs is learning from an offline dataset. Current offline reinforcement learning methods commonly learn in the policy space constrained to in-support regions by the offline dataset, in order to ensure the robustness of the outcome policies. Such constraints, however, also limit the potential of the outcome policies. In this paper, to release the potential of offline policy learning, we investigate the decision-making problems in out-of-support regions directly and propose offline Model-based Adaptable Policy LEarning (MAPLE). By this approach, instead of learning in in-support regions, we learn an adaptable policy that can adapt its behavior in out-of-support regions when deployed. We give a practical implementation of MAPLE via meta-learning techniques and ensemble model learning techniques. We conduct experiments on MuJoCo locomotion tasks with offline datasets. The results show that the proposed method can make robust decisions in out-of-support regions and achieve better performance than SOTA algorithms.

引用

页码：15260 / 15274

页数：15

共 50 条

[31] Intrusion Response Decision-making Method Based on Reinforcement Learning [J].

Yang, Jun-nan ;

Zhang, Hong-qi ;

Zhang, Chuan-fu .

2018 INTERNATIONAL CONFERENCE ON COMMUNICATION, NETWORK AND ARTIFICIAL INTELLIGENCE (CNAI 2018), 2018, :154-162

[32] Multiagent game decision-making method based on the learning mechanism [J].

Wang R. ;

Dong Q. .

Gongcheng Kexue Xuebao/Chinese Journal of Engineering, 2024, 46 (07) :1251-1268

[33] A Comparative Study of Situation Awareness-Based Decision-Making Model Reinforcement Learning Adaptive Automation in Evolving Conditions [J].

Costa, Renato D. ;

Hirata, Celso M. ;

Pugliese, Victor U. .

IEEE ACCESS, 2023, 11 :16166-16182

[34] An integrated model for coordinating adaptive platoons and parking decision-making based on deep reinforcement learning [J].

Li, Jia ;

Guo, Zijian ;

Jiang, Ying ;

Wang, Wenyuan ;

Li, Xin .

COMPUTERS & INDUSTRIAL ENGINEERING, 2025, 203

[35] Decision-Making Model under Risk Assessment Based on Entropy [J].

Dong, Xin ;

Lu, Hao ;

Xia, Yuanpu ;

Xiong, Ziming .

ENTROPY, 2016, 18 (11)

[36] Model-Based Imitation Learning Using Entropy Regularization of Model and Policy [J].

Uchibe, Eiji .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) :10922-10929

[37] Neurocognitive basis of model-based decision making and its metacontrol in childhood [J].

Smid, C. R. ;

Ganesan, K. ;

Thompson, A. ;

Canigueral, R. ;

Veselic, S. ;

Royer, J. ;

Kool, W. ;

Hauser, T. U. ;

Bernhardt, B. ;

Steinbeis, N. .

DEVELOPMENTAL COGNITIVE NEUROSCIENCE, 2023, 62

[38] Enhancing emotion-based learning in decision-making under uncertainty [J].

Alarcon, David ;

Amian, Josue G. ;

Antonio Sanchez-Medina, Jose .

PSICOTHEMA, 2015, 27 (04) :368-373

[39] A DECISION-MAKING METHOD FOR AUTONOMOUS VEHICLES BASED ON SIMULATION AND REINFORCEMENT LEARNING [J].

Zheng, Rui ;

Liu, Chunming ;

Guo, Qi .

PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, :362-369

[40] Reinforcement Learning-Based Intelligent Decision-Making for Communication Parameters [J].

Xie, Xia ;

Dou, Zheng ;

Zhang, Yabin .

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2022, 16 (09) :2942-2960

← 1 2 3 4 5 →