Bayes-adaptive hierarchical MDPs

被引:2
|
作者
Ngo Anh Vien [1 ]
Lee, SeungGwan [2 ]
Chung, TaeChoong [3 ]
机构
[1] Univ Stuttgart, Machine Learning & Robot Lab, D-70174 Stuttgart, Germany
[2] Kyung Hee Univ, Coll Liberal Arts, 1 Seocheon Dong, Yongin 446701, Gyeonggi Do, South Korea
[3] Kyung Hee Univ, Dept Comp Engn, Yongin 446701, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Reinforcement learning; Bayesian reinforcement learning; Hierarchical reinforcement learning; MDP; POMDP; POSMDP; Monte-Carlo tree search; Hierarchical Monte-Carlo planning; POLICY GRADIENT SMDP; RESOURCE-ALLOCATION;
D O I
10.1007/s10489-015-0742-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.
引用
收藏
页码:112 / 126
页数:15
相关论文
共 37 条
  • [1] Bayes-adaptive hierarchical MDPs
    Ngo Anh Vien
    SeungGwan Lee
    TaeChoong Chung
    Applied Intelligence, 2016, 45 : 112 - 126
  • [2] Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs
    Png, Shaowei
    Pineau, Joelle
    Chaib-draa, Brahim
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2012, 6 (08) : 917 - 927
  • [3] Scalable and efficient bayes-adaptive reinforcement learning based on Monte-Carlo tree search
    Guez, Arthur
    Silver, David
    Dayan, Peter
    1600, AI Access Foundation (48): : 841 - 883
  • [4] Contrastive Learning-Based Bayes-Adaptive Meta-Reinforcement Learning for Active Pantograph Control in High-Speed Railways
    Wang, Hui
    Han, Zhiwei
    Wang, Xufan
    Wu, Yanbo
    Liu, Zhigang
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2024, 10 (01): : 2045 - 2056
  • [5] Adaptive Skill Acquisition in Hierarchical Reinforcement Learning
    Holas, Juraj
    Farkas, Igor
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 383 - 394
  • [6] Hierarchical reinforcement learning with adaptive scheduling for robot control
    Huang, Zhigang
    Liu, Quan
    Zhu, Fei
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [7] Adaptive Routing with Hierarchical Reinforcement Learning on Dragonfly Networks
    Cai, Xuhong
    Li, Mo
    Shi, Xingyan
    Shen, Jiayou
    Wu, Chensizhu
    Chen, Yi
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 403 - 409
  • [8] Adaptive Hierarchical Federated Learning Over Wireless Networks
    Xu, Bo
    Xia, Wenchao
    Wen, Wanli
    Liu, Pei
    Zhao, Haitao
    Zhu, Hongbo
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (02) : 2070 - 2083
  • [9] A hierarchical adaptive federated reinforcement learning for efficient resource allocation and task scheduling in hierarchical IoT network
    Sagar, A. S. M. Sharifuzzaman
    Haider, Amir
    Kim, Hyung Seok
    COMPUTER COMMUNICATIONS, 2025, 229
  • [10] Adaptive Polling in Hierarchical Social Networks Using Blackwell Dominance
    Bhat, Sujay
    Krishnamurthy, Vikram
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2019, 5 (03): : 538 - 553