Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

被引:58
作者
Dilokthanakul, Nat [1 ,2 ]
Kaplanis, Christos [1 ]
Pawlowski, Nick [1 ]
Shanahan, Murray [1 ,3 ]
机构
[1] Imperial Coll London, Comp Dept, London SW7 2AZ, England
[2] Vidyasirimedhi Inst Sci & Technol, Rayong 21210, Thailand
[3] DeepMind, London N1C 4AG, England
关键词
Task analysis; Reinforcement learning; Training; Neural networks; Visualization; Trajectory; Learning systems; Auxiliary task; deep reinforcement learning (DRL); hierarchical reinforcement learning (HRL); intrinsic motivation; EXPLORATION;
D O I
10.1109/TNNLS.2019.2891792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the main concerns of deep reinforcement learning (DRL) is the data inefficiency problem, which stems both from an inability to fully utilize data acquired and from naive exploration strategies. In order to alleviate these problems, we propose a DRL algorithm that aims to improve data efficiency via both the utilization of unrewarded experiences and the exploration strategy by combining ideas from unsupervised auxiliary tasks, intrinsic motivation, and hierarchical reinforcement learning (HRL). Our method is based on a simple HRL architecture with a metacontroller and a subcontroller. The subcontroller is intrinsically motivated by the metacontroller to learn to control aspects of the environment, with the intention of giving the agent: 1) a neural representation that is generically useful for tasks that involve manipulation of the environment and 2) the ability to explore the environment in a temporally extended manner through the control of the metacontroller. In this way, we reinterpret the notion of pixel- and feature-control auxiliary tasks as reusable skills that can be learned via an intrinsic reward. We evaluate our method on a number of Atari 2600 games. We found that it outperforms the baseline in several environments and significantly improves performance in one of the hardest games-Montezuma's revenge-for which the ability to utilize sparse data is key. We found that the inclusion of intrinsic reward is crucial for the improvement in the performance and that most of the benefit seems to be derived from the representations learned during training.
引用
收藏
页码:3409 / 3418
页数:10
相关论文
共 50 条
  • [1] Mnemonic Dictionary Learning for Intrinsic Motivation in Reinforcement Learning
    Yan, Renye
    Wu, Zhe
    Zhan, Yuan
    Tao, Pin
    Wang, Zongwei
    Cai, Yimao
    Xing, Junliang
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [2] Intrinsic Motivation and Introspection in Reinforcement Learning
    Merrick, Kathryn E.
    IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 315 - 329
  • [3] Discovering Intrinsic Subgoals for Vision-and-Language Navigation via Hierarchical Reinforcement Learning
    Wang, Jiawei
    Wang, Teng
    Xu, Lele
    He, Zichen
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 6516 - 6528
  • [4] Emotion-Based Intrinsic Motivation for Reinforcement Learning Agents
    Sequeira, Pedro
    Melo, Francisco S.
    Paiva, Ana
    AFFECTIVE COMPUTING AND INTELLIGENT INTERACTION, PT I, 2011, 6974 : 326 - 336
  • [5] Using Emotions as Intrinsic Motivation to Accelerate Classic Reinforcement Learning
    Lu, Cheng-Xiang
    Sun, Zhi-Yuan
    Shi, Zhong-Zhi
    Cao, Bao-Xiang
    2016 INTERNATIONAL CONFERENCE ON INFORMATION SYSTEM AND ARTIFICIAL INTELLIGENCE (ISAI 2016), 2016, : 332 - 337
  • [6] Intrinsic Motivation Based Hierarchical Exploration for Model and Skill Learning
    Lu, Lina
    Zhang, Wanpeng
    Gu, Xueqiang
    Chen, Jing
    ELECTRONICS, 2020, 9 (02)
  • [7] An Information-Theoretic Perspective on Intrinsic Motivation in Reinforcement Learning: A Survey
    Aubret, Arthur
    Matignon, Laetitia
    Hassas, Salima
    ENTROPY, 2023, 25 (02)
  • [8] Intrinsic Motivation in Model-Based Reinforcement Learning: A Brief Review
    A. K. Latyshev
    A. I. Panov
    Scientific and Technical Information Processing, 2024, 51 (5) : 460 - 470
  • [9] Prioritized Sampling with Intrinsic Motivation in Multi-Task Reinforcement Learning
    D'Eramo, Carlo
    Chalvatzaki, Georgia
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [10] Exploration Approaches in Deep Reinforcement Learning Based on Intrinsic Motivation: A Review
    Zeng J.
    Qin L.
    Xu H.
    Zhang Q.
    Hu Y.
    Yin Q.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2023, 60 (10): : 2359 - 2382