Self-Supervised Attention-Aware Reinforcement Learning

被引：0

作者：

Wu, Haiping ^{[1
,2
]}

Khetarpa, Khimya ^{[1
,2
]}

Precup, Doina ^{[1
,2
,3
]}

机构：

[1] McGill Univ, Montreal, PQ, Canada

[2] Mila, Montreal, PQ, Canada

[3] Google DeepMind, Montreal, PQ, Canada

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

关键词：

PREDICT;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual saliency has emerged as a major visualization tool for interpreting deep reinforcement learning (RL) agents. However, much of the existing research uses it as an analyzing tool rather than an inductive bias for policy learning. In this work, we use visual attention as an inductive bias for RL agents. We propose a novel self-supervised attention learning approach which can 1. learn to select regions of interest without explicit annotations, and 2. act as a plug for existing deep RL methods to improve the learning performance. We empirically show that the self-supervised attention-aware deep RL methods outperform the baselines in the context of both the rate of convergence and performance. Furthermore, the proposed self-supervised attention is not tied with specific policies, nor restricted to a specific scene. We posit that the proposed approach is a general self-supervised attention module for multi-task learning and transfer learning, and empirically validate the generalization ability of the proposed method. Finally, we show that our method learns meaningful object keypoints highlighting improvements both qualitatively and quantitatively.

引用

页码：10311 / 10319

页数：9

共 38 条

[1] Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration [J].

Alpert, Sharon ;

Galun, Meirav ;

Brandt, Achi ;

Basri, Ronen .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2012, 34 (02) :315-327

[2]

Anand A, 2019, ADV NEUR IN, V32

[3]

[Anonymous], 2018, ARXIV181104407

[4] The Arcade Learning Environment: An Evaluation Platform for General Agents [J].

Bellemare, Marc G. ;

Naddaf, Yavar ;

Veness, Joel ;

Bowling, Michael .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 :253-279

[5] Salient Object Detection: A Benchmark [J].

Borji, Ali ;

Sihite, Dicky N. ;

Itti, Laurent .

COMPUTER VISION - ECCV 2012, PT II, 2012, 7573 :414-429

[6]

Brockman Greg, 2016, arXiv

[7]

Eslami SMA, 2016, 30 C NEURAL INFORM P, V29

[8]

Goel V., 2018, Advances in Neural Information Processing Systems, P5683, DOI DOI 10.48550/ARXIV.1805.07780

[9]

Gopalakrishnan A., 2021, INT C LEARN REPR

[10]

Greff K, 2019, PR MACH LEARN RES, V97

← 1 2 3 4 →