Deep Reinforcement Learning With Part-Aware Exploration Bonus in Video Games

被引：3

作者：

Xu, Pei ^{[1
,2
]}

Yin, Qiyue ^{[2
,3
]}

Zhang, Junge ^{[2
,3
]}

Huang, Kaiqi ^{[2
,3
,4
,5
]}

机构：

[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Ctr Res Intelligent Syst & Engn, Beijing 100190, Peoples R China

[3] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[4] Chinese Acad Sci, Natl Lab Pattern Recognit, Inst Automat, Beijing 100190, Peoples R China

[5] CAS Ctr Excellence Brain Sci & Intelligence Techno, Beijing 100190, Peoples R China

来源：

IEEE TRANSACTIONS ON GAMES | 2022年 / 14卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Deep learning; exploration; reinforcement learning; video game;

D O I：

10.1109/TG.2021.3134259

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning algorithms rely on carefully engineering environment rewards that are extrinsic to agents. However, environments with dense rewards are rare, motivating the need for developing reward functions that are intrinsic to agents. Curiosity is a type of successful intrinsic reward function, which uses the prediction error as an reward signal. In prior work, the prediction problem used to generate intrinsic rewards is optimized in the pixel space rather than a learnable feature space to avoid randomness caused by feature changes. However, these methods ignore small but important elements of the states that are often associated with locations of the character, which makes it impossible to generate accurate internal rewards for efficient exploration. In this article, we first demonstrate the effectiveness of introducing prior learned features for existing prediction-based exploration methods. Then, an attention map mechanism is designed to discretize learned features, thereby updating the learned feature and meanwhile reducing the impact of randomness on intrinsic rewards caused by the learning process of features. We verify our method on some video games from the standard reinforcement learning Atari benchmark, achieving clear improvements over random network distillation, which is one of the most advanced exploration methods, in almost all Atari games.

引用

页码：644 / 653

页数：10

共 34 条

[1] [Anonymous], 2016, PROC INT C MACH LEAR
[2] Badia A. P., 2020, PROC 8 INT C LEARN R
[3] Badia AP, 2020, PR MACH LEARN RES, V119
[4] Bellemare MG, 2016, ADV NEUR IN, V29
[5] Brockman G, 2016, Arxiv, DOI [arXiv:1606.01540, DOI 10.48550/ARXIV.1606.01540]
[6] Burda Y., 2019, EXPLORATION RANDOM N
[7] Chen XZ, 2015, ADV NEUR IN, V28
[8] Choi J., 2019, PROC INT C LEARN REP
[9] Denil M., 2016, P INT C LEARN REPR
[10] Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning
Dilokthanakul, Nat
Kaplanis, Christos
Pawlowski, Nick
Shanahan, Murray
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) : 3409 - 3418

← 1 2 3 4 →