Uncertainty-Aware Data Augmentation for Offline Reinforcement Learning

被引：1

作者：

Su, Yunjie ^{[1
]}

Kong, Yilun ^{[1
]}

Wang, Xueqian ^{[1
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen, Peoples R China

来源：

2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年

关键词：

Data augmentation; Uncertainty estimation; Out of distribution; Offline reinforcement learning;

D O I：

10.1109/IJCNN54540.2023.10191211

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the key challenges in Offline Reinforcement Learning is that it cannot conduct further environment exploration and performs poorly in terms of out-of-distribution generalizations. Data augmentation is commonly used to solve the issue of limited coverage of the full state-action space in static offline dataset. However, the existing data augmentation methods for proprioceptive observation suffer from the dilemma where the data coverage is often limited by tight constraints, while aggressive methods may exacerbate the performance. At the heart of this phenomenon are the diverged action distribution and the high uncertainty of the value function. In this paper, we propose to extend the static offline datasets during training by adding gradient-based perturbation to the state and utilizing the estimated uncertainty of the value function to constrain the range of the gradient. The estimated uncertainty of the value function works as a guidance to adjust the range of augmentation automatically, ensuring the adaptability and reliability of the state perturbation. The proposed algorithm Uncertainty-Aware Data Augmentation(UADA), is plugged into various standard offline RL algorithms and evaluated on several offline reinforcement learning tasks. The empirical results confirm that UADA substantially improves the performance and achieves better model stability compared with the original algorithms.

引用

页数：8

共 45 条

[1]

[Anonymous], 2010, P 19 INT C WORLD WID

[2]

[Anonymous], PR MACH LEARN RES

[3]

Chen X., 2020, ADV NEURAL INFORM PR, P18353, DOI DOI 10.48550/ARXIV.1910.12179

[4]

Emmons S., 2021, INT C LEARN REPR

[5]

Fu J., 2020, D4rl: Datasets for deep data -driven reinforcement learning

[6]

Fujimoto S, 2021, ADV NEUR IN, V34

[7]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[8]

Gal Y, 2016, PR MACH LEARN RES, V48

[9]

Haarnoja T, 2018, PR MACH LEARN RES, V80

[10] Reinforcement Learning with Uncertainty Estimation for Tactical Decision-Making in Intersections [J].

Hoel, Carl-Johan ;

Tram, Tommy ;

Sjoberg, Jonas .

2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), 2020,

← 1 2 3 4 5 →