Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

被引:58
作者
Dilokthanakul, Nat [1 ,2 ]
Kaplanis, Christos [1 ]
Pawlowski, Nick [1 ]
Shanahan, Murray [1 ,3 ]
机构
[1] Imperial Coll London, Comp Dept, London SW7 2AZ, England
[2] Vidyasirimedhi Inst Sci & Technol, Rayong 21210, Thailand
[3] DeepMind, London N1C 4AG, England
关键词
Task analysis; Reinforcement learning; Training; Neural networks; Visualization; Trajectory; Learning systems; Auxiliary task; deep reinforcement learning (DRL); hierarchical reinforcement learning (HRL); intrinsic motivation; EXPLORATION;
D O I
10.1109/TNNLS.2019.2891792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the main concerns of deep reinforcement learning (DRL) is the data inefficiency problem, which stems both from an inability to fully utilize data acquired and from naive exploration strategies. In order to alleviate these problems, we propose a DRL algorithm that aims to improve data efficiency via both the utilization of unrewarded experiences and the exploration strategy by combining ideas from unsupervised auxiliary tasks, intrinsic motivation, and hierarchical reinforcement learning (HRL). Our method is based on a simple HRL architecture with a metacontroller and a subcontroller. The subcontroller is intrinsically motivated by the metacontroller to learn to control aspects of the environment, with the intention of giving the agent: 1) a neural representation that is generically useful for tasks that involve manipulation of the environment and 2) the ability to explore the environment in a temporally extended manner through the control of the metacontroller. In this way, we reinterpret the notion of pixel- and feature-control auxiliary tasks as reusable skills that can be learned via an intrinsic reward. We evaluate our method on a number of Atari 2600 games. We found that it outperforms the baseline in several environments and significantly improves performance in one of the hardest games-Montezuma's revenge-for which the ability to utilize sparse data is key. We found that the inclusion of intrinsic reward is crucial for the improvement in the performance and that most of the benefit seems to be derived from the representations learned during training.
引用
收藏
页码:3409 / 3418
页数:10
相关论文
共 50 条
  • [31] Developing Driving Strategies Efficiently: A Skill-Based Hierarchical Reinforcement Learning Approach
    Gurses, Yigit
    Buyukdemirci, Kaan
    Yildiz, Yildiray
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 121 - 126
  • [32] Collective Intrinsic Motivation of a Multi-agent System Based on Reinforcement Learning Algorithms
    Bolshakov, Vladislav
    Sakulin, Sergey
    Alfimtsev, Alexander
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 4, INTELLISYS 2023, 2024, 825 : 655 - 670
  • [33] Hierarchical Affordance Discovery using Intrinsic Motivation
    Manoury, Alexandre
    Nguyen, Sao Mai
    Buche, Cedric
    PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON HUMAN-AGENT INTERACTION (HAI'19), 2019, : 186 - 193
  • [34] Automatic Hierarchical Reinforcement Learning for Reusing Service Process Fragments
    Yang, Rong
    Li, Bing
    Liu, Zhengli
    IEEE ACCESS, 2021, 9 : 20746 - 20759
  • [35] Hierarchical Reinforcement Learning With Universal Policies for Multistep Robotic Manipulation
    Yang, Xintong
    Ji, Ze
    Wu, Jing
    Lai, Yu-Kun
    Wei, Changyun
    Liu, Guoliang
    Setchi, Rossitza
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (09) : 4727 - 4741
  • [36] Robots Learn Increasingly Complex Tasks with Intrinsic Motivation and Automatic Curriculum LearningDomain Knowledge by Emergence of Affordances, Hierarchical Reinforcement and Active Imitation Learning
    Sao Mai Nguyen
    Nicolas Duminy
    Alexandre Manoury
    Dominique Duhaut
    Cedric Buche
    KI - Künstliche Intelligenz, 2021, 35 : 81 - 90
  • [37] Hierarchical and Stable Multiagent Reinforcement Learning for Cooperative Navigation Control
    Jin, Yue
    Wei, Shuangqing
    Yuan, Jian
    Zhang, Xudong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) : 90 - 103
  • [38] Control parallel double inverted pendulum by hierarchical reinforcement learning
    Zheng, Y
    Luo, SW
    Lv, Z
    Wu, LN
    2004 7TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS 1-3, 2004, : 1614 - 1617
  • [39] ELSIM: End-to-End Learning of Reusable Skills Through Intrinsic Motivation
    Aubret, Arthur
    Matignon, Laetitia
    Hassas, Salima
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT II, 2021, 12458 : 541 - 556
  • [40] Ethological Concepts in Hierarchical Reinforcement Learning and Control of Intelligent Agents
    Nahodil, Pavel
    23RD EUROPEAN CONFERENCE ON MODELLING AND SIMULATION (ECMS 2009), 2009, : 180 - 186