Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning

被引:58
作者
Dilokthanakul, Nat [1 ,2 ]
Kaplanis, Christos [1 ]
Pawlowski, Nick [1 ]
Shanahan, Murray [1 ,3 ]
机构
[1] Imperial Coll London, Comp Dept, London SW7 2AZ, England
[2] Vidyasirimedhi Inst Sci & Technol, Rayong 21210, Thailand
[3] DeepMind, London N1C 4AG, England
关键词
Task analysis; Reinforcement learning; Training; Neural networks; Visualization; Trajectory; Learning systems; Auxiliary task; deep reinforcement learning (DRL); hierarchical reinforcement learning (HRL); intrinsic motivation; EXPLORATION;
D O I
10.1109/TNNLS.2019.2891792
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the main concerns of deep reinforcement learning (DRL) is the data inefficiency problem, which stems both from an inability to fully utilize data acquired and from naive exploration strategies. In order to alleviate these problems, we propose a DRL algorithm that aims to improve data efficiency via both the utilization of unrewarded experiences and the exploration strategy by combining ideas from unsupervised auxiliary tasks, intrinsic motivation, and hierarchical reinforcement learning (HRL). Our method is based on a simple HRL architecture with a metacontroller and a subcontroller. The subcontroller is intrinsically motivated by the metacontroller to learn to control aspects of the environment, with the intention of giving the agent: 1) a neural representation that is generically useful for tasks that involve manipulation of the environment and 2) the ability to explore the environment in a temporally extended manner through the control of the metacontroller. In this way, we reinterpret the notion of pixel- and feature-control auxiliary tasks as reusable skills that can be learned via an intrinsic reward. We evaluate our method on a number of Atari 2600 games. We found that it outperforms the baseline in several environments and significantly improves performance in one of the hardest games-Montezuma's revenge-for which the ability to utilize sparse data is key. We found that the inclusion of intrinsic reward is crucial for the improvement in the performance and that most of the benefit seems to be derived from the representations learned during training.
引用
收藏
页码:3409 / 3418
页数:10
相关论文
共 50 条
  • [21] Intrinsic and Extrinsic Motivation and Learning in Schizophrenia
    Kremen L.C.
    Fiszdon J.M.
    Kurtz M.M.
    Silverstein S.M.
    Choi J.
    Current Behavioral Neuroscience Reports, 2016, 3 (2) : 144 - 153
  • [22] Automated Feature Selection: A Reinforcement Learning Perspective
    Liu, Kunpeng
    Fu, Yanjie
    Wu, Le
    Li, Xiaolin
    Aggarwal, Charu
    Xiong, Hui
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (03) : 2272 - 2284
  • [23] A Task-Agnostic Regularizer for Diverse Subpolicy Discovery in Hierarchical Reinforcement Learning
    Huo, Liangyu
    Wang, Zulin
    Xu, Mai
    Song, Yuhang
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (03): : 1932 - 1944
  • [24] Reinforcement learning based on intrinsic motivation and temporal abstraction via transformation invariance
    Masuyama, Gakuto
    Yamashita, Atsushi
    Asama, Hajime
    Masuyama, G. (masuyama@robot.t.u-tokyo.ac.jp), 1600, Japan Society of Mechanical Engineers (79): : 289 - 303
  • [25] Phased learning with hierarchical reinforcement learning in nonholonomic motion control
    Goto, Takaknuni
    Homma, Noriyasu
    Yoshizawa, Makoto
    Abe, Kenichi
    2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13, 2006, : 2200 - +
  • [26] A Hierarchical Framework for Passenger Inflow Control in Metro System With Reinforcement Learning
    Zhong, Jiaming
    He, Zhaocheng
    Wang, Jiawei
    Xie, Jiemin
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (10) : 10895 - 10911
  • [27] Hierarchical Deep Reinforcement Learning for cubesat guidance and control
    Tammam, Abdulla
    Aouf, Nabil
    CONTROL ENGINEERING PRACTICE, 2025, 156
  • [28] Hierarchical control for stochastic network traffic with reinforcement learning
    Su, Z. C.
    Chow, Andy H. F.
    Fang, C. L.
    Liang, E. M.
    Zhong, R. X.
    TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2023, 167 : 196 - 216
  • [29] Hierarchical Deep Reinforcement Learning for Continuous Action Control
    Yang, Zhaoyang
    Merrick, Kathryn
    Jin, Lianwen
    Abbass, Hussein A.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (11) : 5174 - 5184
  • [30] Uncertainty Estimation based Intrinsic Reward For Efficient Reinforcement Learning
    Chen, Chao
    Wan, Tianjiao
    Shi, Peichang
    Ding, Bo
    Gao, Zijian
    Feng, Dawei
    2022 IEEE 13TH INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING (JCC 2022), 2022, : 1 - 8