Bayes-adaptive hierarchical MDPs

被引:2
作者
Ngo Anh Vien [1 ]
Lee, SeungGwan [2 ]
Chung, TaeChoong [3 ]
机构
[1] Univ Stuttgart, Machine Learning & Robot Lab, D-70174 Stuttgart, Germany
[2] Kyung Hee Univ, Coll Liberal Arts, 1 Seocheon Dong, Yongin 446701, Gyeonggi Do, South Korea
[3] Kyung Hee Univ, Dept Comp Engn, Yongin 446701, Gyeonggi Do, South Korea
基金
新加坡国家研究基金会;
关键词
Reinforcement learning; Bayesian reinforcement learning; Hierarchical reinforcement learning; MDP; POMDP; POSMDP; Monte-Carlo tree search; Hierarchical Monte-Carlo planning; POLICY GRADIENT SMDP; RESOURCE-ALLOCATION;
D O I
10.1007/s10489-015-0742-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.
引用
收藏
页码:112 / 126
页数:15
相关论文
共 41 条
[31]   Adaptive Control for Underwater Simultaneous Lightwave Information and Power Transfer: A Hierarchical Deep-Reinforcement Approach [J].
Shin, Huicheol ;
Jeong, Sangki ;
Baek, Seungjae ;
Song, Yujae .
JOURNAL OF MARINE SCIENCE AND ENGINEERING, 2024, 12 (09)
[32]   Hierarchical Q-Learning Based UAV Secure Communication against Multiple UAV Adaptive Eavesdroppers [J].
Liu, Jue ;
Sha, Nan ;
Yang, Weiwei ;
Tu, Jia ;
Yang, Lianxin .
WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020
[33]   DRL-ECMS: An Adaptive Hierarchical Equivalent Consumption Minimization Strategy Based on Deep Reinforcement Learning [J].
Lin, Yang ;
Chu, Liang ;
Hu, Jincheng ;
Zhang, Yuanjian ;
Hou, Zhuoran .
2022 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2022, :235-240
[34]   Joint Adaptive Aggregation and Resource Allocation for Hierarchical Federated Learning Systems Based on Edge-Cloud Collaboration [J].
Su, Yi ;
Fan, Wenhao ;
Meng, Qingcheng ;
Chen, Penghui ;
Liu, Yuan'an .
IEEE TRANSACTIONS ON CLOUD COMPUTING, 2025, 13 (01) :369-382
[35]   A Reinforcement Learning System with Chaotic Neural Networks-Based Adaptive Hierarchical Memory Structure for Autonomous Robots [J].
Obayashi, Masanao ;
Narita, Kenichiro ;
Kuremoto, Takashi ;
Kobayashi, Kunikazu .
2008 INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS, VOLS 1-4, 2008, :80-85
[36]   Hierarchical Reinforcement Learning-Based Adaptive Initial QP Selection and Rate Control for H.266/VVC [J].
He, Shuqian ;
Jin, Biao ;
Tian, Shangneng ;
Liu, Jiayu ;
Deng, Zhengjie ;
Shi, Chun .
ELECTRONICS, 2024, 13 (24)
[37]   QoS-aware Adaptive Routing in Multi-layer Hierarchical Software Defined Networks: A Reinforcement Learning Approach [J].
Lin, Shih-Chun ;
Akyildiz, Ian F. ;
Wang, Pu ;
Luo, Min .
PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON SERVICES COMPUTING (SCC 2016), 2016, :25-33
[38]   Multi objective task resource allocation method based on hierarchical Bayesian adaptive sparsity for edge computing in low voltage stations [J].
Liu, Yupeng ;
Yan, Bofeng ;
Yu, Jia .
IET CYBER-PHYSICAL SYSTEMS: THEORY & APPLICATIONS, 2024, 9 (01) :63-71
[39]   Energy-efficient and self-adaptive AGV scheduling approach based on hierarchical reinforcement learning for flexible shop floor [J].
Chang, Xiao ;
Jia, Xiaoliang ;
Hu, Hao .
COMPUTERS & INDUSTRIAL ENGINEERING, 2025, 205
[40]   Adaptive path planning for wafer second probing via an attention-based hierarchical reinforcement learning framework with shared memory ☆ [J].
Shi, Haobin ;
He, Ziming ;
Hwang, Kao-Shing .
INFORMATION SCIENCES, 2025, 710