Model Predictive Control-Based Value Estimation for Efficient Reinforcement Learning

被引：0

作者：

Wu, Qizhen ^{[1
]}

Liu, Kexin ^{[1
]}

Chen, Lei ^{[2
]}

机构：

[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing 100191, Peoples R China

[2] Beijing Inst Technol, Adv Res Inst Multidisciplinary Sci, Beijing 100081, Peoples R China

来源：

IEEE INTELLIGENT SYSTEMS | 2024年 / 39卷 / 03期

基金：

国家重点研发计划; 美国国家科学基金会;

关键词：

Predictive models; Neural networks; Trajectory; Data models; Computational modeling; Training; Optimization; Reinforcement learning; Predictive control;

D O I：

10.1109/MIS.2024.3386204

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) suffers from limitations in real practices primarily due to the number of required interactions with virtual environments. It results in a challenging problem because we are implausible to obtain a local optimal strategy with only a few attempts for many learning methods. Hereby, we design an improved RL method based on model predictive control that models the environment through a data-driven approach. Based on the learned environment model, it performs multistep prediction to estimate the value function and optimize the policy. The method demonstrates higher learning efficiency, faster convergent speed of strategies tending to the local optimal value, and less sample capacity space required by experience replay buffers. Experimental results, both in classic databases and in a dynamic obstacle-avoidance scenario for an unmanned aerial vehicle, validate the proposed approaches.

引用

页码：63 / 72

页数：10

共 20 条

[11] Kostrikov I., 2021, P INT C LEARN REPR I
[12] Lillicrap T. P., 2019, arXiv, DOI DOI 10.48550/ARXIV.1509.02971
[13] A Heuristic Planning Reinforcement Learning-Based Energy Management for Power-Split Plug-in Hybrid Electric Vehicles
Liu, Teng
Hu, Xiaosong
Hu, Weihao
Zou, Yuan
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2019, 15 (12) : 6436 - 6445
[14] Mnih V, 2013, Arxiv, DOI arXiv:1312.5602
[15] Pan F. Y., 2020, P ADV NEUR INF PROC, V33, P546
[16] Sutton R. S., 1990, Machine Learning: Proceedings of the Seventh International Conference (1990), P216
[17] Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
[18] Challenges and Opportunities of Applying Reinforcement Learning to Autonomous Racing
Wurman, Peter R.
Stone, Peter
Spranger, Michael
[J]. IEEE INTELLIGENT SYSTEMS, 2022, 37 (03) : 20 - 23
[19] Distributed adaptive cooperative time-varying formation tracking guidance for multiple aerial vehicles system
Yu, Jianglong
Dong, Xiwang
Li, Qingdong
Lv, Jinhu
Ren, Zhang
[J]. AEROSPACE SCIENCE AND TECHNOLOGY, 2021, 117
[20] Zhao WS, 2020, 2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), P737, DOI [10.1109/SSCI47803.2020.9308468, 10.1109/ssci47803.2020.9308468]

← 1 2 →