Data-efficient model-based reinforcement learning with trajectory discrimination

被引：0

作者：

Qu, Tuo ^{[1
]}

Duan, Fuqing ^{[1
]}

Zhang, Junge ^{[2
]}

Zhao, Bo ^{[3
]}

Huang, Wenzhen ^{[2
]}

机构：

[1] Beijing Normal Univ, Sch Artificial Intelligence, 19 Xinjiekou Outer St, Beijing 100875, Peoples R China

[2] Chinese Acad Sci, Inst Automat, 95 Zhongguancun East Rd, Beijing 100190, Peoples R China

[3] Beijing Normal Univ, Sch Syst Sci, 19 Xinjiekou Outer St, Beijing 100875, Peoples R China

来源：

COMPLEX & INTELLIGENT SYSTEMS | 2024年 / 10卷 / 02期

关键词：

Reinforcement learning; Deep learning; Continuous control task; World model; OBJECTIVE PENALTY-FUNCTION; PREDICTIVE CONTROL; TRACKING; OPTIMIZATION;

D O I：

10.1007/s40747-023-01247-5

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Deep reinforcement learning has always been used to solve high-dimensional complex sequential decision problems. However, one of the biggest challenges for reinforcement learning is sample efficiency, especially for high-dimensional complex problems. Model-based reinforcement learning can solve the problem with a learned world model, but the performance is limited by the imperfect world model, so it usually has worse approximate performance than model-free reinforcement learning. In this paper, we propose a novel model-based reinforcement learning algorithm called World Model with Trajectory Discrimination (WMTD). We learn the representation of temporal dynamics information by adding a trajectory discriminator to the world model, and then compute the weight of state value estimation based on the trajectory discriminator to optimize the policy. Specifically, we augment the trajectories to generate negative samples and train a trajectory discriminator that shares the feature extractor with the world model. Experimental results demonstrate that our method improves the sample efficiency and achieves state-of-the-art performance on DeepMind control tasks.

引用

页码：1927 / 1936

页数：10

共 50 条

[21] Data-Efficient Reinforcement Learning for Complex Nonlinear Systems
Donge, Vrushabh S.
Lian, Bosen
Lewis, Frank L.
Davoudi, Ali
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (03) : 1391 - 1402
[22] A Data-Efficient Training Method for Deep Reinforcement Learning
Feng, Wenhui
Han, Chongzhao
Lian, Feng
Liu, Xia
ELECTRONICS, 2022, 11 (24)
[23] Data-Efficient Deep Reinforcement Learning with Symmetric Consistency
Zhang, Xianchao
Yang, Wentao
Zhang, Xiaotong
Liu, Han
Wang, Guanglu
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2430 - 2436
[24] A Data-Efficient Model-Based Learning Framework for the Closed-Loop Control of Continuum Robots
Wang, Xinran
Rojas, Nicolas
2022 IEEE 5TH INTERNATIONAL CONFERENCE ON SOFT ROBOTICS (ROBOSOFT), 2022, : 247 - 254
[25] Optimistic Sampling Strategy for Data-Efficient Reinforcement Learning
Zhao, Dongfang
Liu, Jiafeng
Wu, Rui
Cheng, Dansong
Tang, Xianglong
IEEE ACCESS, 2019, 7 : 55763 - 55769
[26] Concurrent Credit Assignment for Data-efficient Reinforcement Learning
Dauce, Emmanuel
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[27] Optimizing Traffic Control with Model-Based Learning: A Pessimistic Approach to Data-Efficient Policy Inference
Kunjir, Mayuresh
Chawla, Sanjay
Chandrasekar, Siddarth
Jay, Devika
Ravindran, Balaraman
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 1176 - 1187
[28] Model-Based Reinforcement Learning for Trajectory Tracking of Musculoskeletal Robots
Xu, Haoran
Fan, Jianyin
Wang, Qiang
2023 IEEE INTERNATIONAL INSTRUMENTATION AND MEASUREMENT TECHNOLOGY CONFERENCE, I2MTC, 2023,
[29] Continual Model-Based Reinforcement Learning for Data Efficient Wireless Network Optimisation
Hasan, Cengis
Agapitos, Alexandros
Lynch, David
Castagna, Alberto
Cruciata, Giorgio
Wang, Hao
Milenovic, Aleksandar
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: APPLIED DATA SCIENCE AND DEMO TRACK, ECML PKDD 2023, PT VI, 2023, 14174 : 295 - 311
[30] An Efficient Approach to Model-Based Hierarchical Reinforcement Learning
Li, Zhuoru
Narayan, Akshay
Leong, Tze-Yun
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3583 - 3589

← 1 2 3 4 5 →