Model-Based Reinforcement Learning With Probabilistic Ensemble Terminal Critics for Data-Efficient Control Applications

被引：4

作者：

Park, Jonghyeok ^{[1
]}

Jeon, Soo ^{[2
]}

Han, Soohee ^{[1
]}

机构：

[1] Pohang Univ Sci & Technol, Dept Elect Engn & Convergence IT Engn, Pohang 37673, South Korea

[2] Univ Waterloo, Dept Mech & Mechatron Engn, Waterloo, ON N2L 3G1, Canada

来源：

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS | 2024年 / 71卷 / 08期

基金：

新加坡国家研究基金会;

关键词：

Heuristic algorithms; Data models; Robots; Probabilistic logic; Computational modeling; Reliability; Reinforcement learning; Cartpole system; model-predictive controller (MPC); model-based reinforcement learning (RL); probabilistic ensemble terminal critics (PETC);

D O I：

10.1109/TIE.2023.3331074

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article proposes a data-efficient model-based reinforcement learning (RL) algorithm empowered by reliable future reward estimates achieved through a confidence-based probabilistic ensemble terminal critics (PETC). The proposed algorithm utilizes a model-predictive controller to choose an action that optimizes the sum of the near and distant future rewards for a given current state. Near future rewards with high confidence are determined directly from trained deterministic dynamics and reward models. Distant future rewards beyond these horizons are meticulously assessed using the proposed confidence-based PETC, which minimizes estimation errors inherent in the distant future and quantifies uncertainty confidence. Through such confidence-based guided actions, the proposed approach is expected to operate in a reliable, explainable, and data-efficient manner, consistently guiding the system to an optimal trajectory. A comparison with the existing state-of-the-art RL algorithms for eight DeepMind Control Suite tasks confirms the superior data efficiency of the proposed approach, which achieves an average cumulative reward of 761.2 in merely 500K steps, whereas the other algorithms score below 700.0. The proposed algorithm is also successfully applied to two real-world control applications, namely single- and double-cartpole swing-up tasks.

引用

页码：9470 / 9479

页数：10

共 50 条

[1] DATA-EFFICIENT MODEL-BASED REINFORCEMENT LEARNING FOR ROBOT CONTROL
Sun, Ming
Gao, Yue
Liu, Wei
Li, Shaoyuan
INTERNATIONAL JOURNAL OF ROBOTICS & AUTOMATION, 2021, 36 (04): : 211 - 218
[2] Data-Efficient Reinforcement Learning with Probabilistic Model Predictive Control
Kamthe, Sanket
Deisenroth, Marc Peter
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[3] Data-Efficient Task Generalization via Probabilistic Model-Based Meta Reinforcement Learning
Bhardwaj, Arjun
Rothfuss, Jonas
Sukhija, Bhavya
As, Yarden
Hutter, Marco
Coros, Stelian
Krause, Andreas
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (04) : 3918 - 3925
[4] A Safe and Data-Efficient Model-Based Reinforcement Learning System for HVAC Control
Ding, Xianzhong
An, Zhiyu
Rathee, Arya
Du, Wan
IEEE INTERNET OF THINGS JOURNAL, 2025, 12 (07): : 8014 - 8032
[5] Data-efficient model-based reinforcement learning with trajectory discrimination
Tuo Qu
Fuqing Duan
Junge Zhang
Bo Zhao
Wenzhen Huang
Complex & Intelligent Systems, 2024, 10 : 1927 - 1936
[6] Data-efficient model-based reinforcement learning with trajectory discrimination
Qu, Tuo
Duan, Fuqing
Zhang, Junge
Zhao, Bo
Huang, Wenzhen
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (02) : 1927 - 1936
[7] Identifying Ordinary Differential Equations for Data-efficient Model-based Reinforcement Learning
Nagel, Tobias
Huber, Marco F.
2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
[8] Model-Based Data-Efficient Reinforcement Learning for Active Pantograph Control in High-Speed Railways
Wang, Hui
Liu, Zhigang
Wang, Xufan
Meng, Xiangyu
Wu, Yanbo
Han, Zhiwei
IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2024, 10 (02): : 2701 - 2712
[9] Data-Efficient Hierarchical Reinforcement Learning for Robotic Assembly Control Applications
Hou, Zhimin
Fei, Jiajun
Deng, Yuelin
Xu, Jing
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2021, 68 (11) : 11565 - 11575
[10] Data-Efficient Reinforcement Learning for Malaria Control
Zou, Lixin
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 507 - 513

← 1 2 3 4 5 →