LEARNING DIVERSE SUB-POLICIES VIA A TASK-AGNOSTIC REGULARIZATION ON ACTION DISTRIBUTIONS

被引:0
作者
Huo, Liangyu [1 ]
Wang, Zulin [1 ]
Xu, Mai [1 ]
Song, Yuhang [2 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
[2] Univ Oxford, Dept Comp Sci, Oxford, England
来源
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING | 2020年
关键词
Sub-policy discovery; regularization; hierarchical reinforcement learning; reinforcement learning; REINFORCEMENT; FRAMEWORK;
D O I
10.1109/icassp40776.2020.9053393
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Automatic sub-policy discovery has recently received much attention in hierarchical reinforcement learning (HRL). The conventional approaches to learning sub-policies suffer from collapsing into just one sub-policy dominating the whole task, lacking techniques to ensure the diversity of different subpolicies. In this paper, we formulate the discovery of diverse sub-policies as a trajectory inference. Then, we propose an information-theoretic objective based on action distributions to encourage diversity. Moreover, two simplifications are derived on discrete and continuous action space for reducing the computation. Finally, the experimental results show that the proposed approach can further improve the state-of-the-art approaches without modifying existing hyperparameters on two different HRL domains, suggesting the wide applicability and robustness of our approach.
引用
收藏
页码:3932 / 3936
页数:5
相关论文
共 22 条
  • [11] Kulkarni T. D., 2016, Advances in neural information processing systems, P3675, DOI DOI 10.48550/ARXIV.1604.06057
  • [12] Lillicrap T. P., 2015, INT C LEARN REPR
  • [13] Machado MC, 2017, PR MACH LEARN RES, V70
  • [14] Mnih V, 2016, PR MACH LEARN RES, V48
  • [15] Nachum O, 2018, ADV NEUR IN, V31
  • [16] Parr R, 1998, ADV NEUR IN, V10, P1043
  • [17] Schulman J., arXiv
  • [18] Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning
    Sutton, RS
    Precup, D
    Singh, S
    [J]. ARTIFICIAL INTELLIGENCE, 1999, 112 (1-2) : 181 - 211
  • [19] Tessler C, 2017, AAAI CONF ARTIF INTE, P1553
  • [20] Todorov E, 2012, IEEE INT C INT ROBOT, P5026, DOI 10.1109/IROS.2012.6386109