Bayes-adaptive hierarchical MDPs

被引：2

作者：

Ngo Anh Vien ^{[1
]}

Lee, SeungGwan ^{[2
]}

Chung, TaeChoong ^{[3
]}

机构：

[1] Univ Stuttgart, Machine Learning & Robot Lab, D-70174 Stuttgart, Germany

[2] Kyung Hee Univ, Coll Liberal Arts, 1 Seocheon Dong, Yongin 446701, Gyeonggi Do, South Korea

[3] Kyung Hee Univ, Dept Comp Engn, Yongin 446701, Gyeonggi Do, South Korea

来源：

APPLIED INTELLIGENCE | 2016年 / 45卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

Reinforcement learning; Bayesian reinforcement learning; Hierarchical reinforcement learning; MDP; POMDP; POSMDP; Monte-Carlo tree search; Hierarchical Monte-Carlo planning; POLICY GRADIENT SMDP; RESOURCE-ALLOCATION;

D O I：

10.1007/s10489-015-0742-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.

引用

页码：112 / 126

页数：15

共 37 条

[1] Bayes-adaptive hierarchical MDPs
Ngo Anh Vien
SeungGwan Lee
TaeChoong Chung
Applied Intelligence, 2016, 45 : 112 - 126
[2] Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs
Png, Shaowei
Pineau, Joelle
Chaib-draa, Brahim
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2012, 6 (08) : 917 - 927
[3] Scalable and efficient bayes-adaptive reinforcement learning based on Monte-Carlo tree search
Guez, Arthur
Silver, David
Dayan, Peter
1600, AI Access Foundation (48): : 841 - 883
[4] Contrastive Learning-Based Bayes-Adaptive Meta-Reinforcement Learning for Active Pantograph Control in High-Speed Railways
Wang, Hui
Han, Zhiwei
Wang, Xufan
Wu, Yanbo
Liu, Zhigang
IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2024, 10 (01): : 2045 - 2056
[5] Adaptive Skill Acquisition in Hierarchical Reinforcement Learning
Holas, Juraj
Farkas, Igor
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2020, PT II, 2020, 12397 : 383 - 394
[6] Hierarchical reinforcement learning with adaptive scheduling for robot control
Huang, Zhigang
Liu, Quan
Zhu, Fei
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
[7] Adaptive Routing with Hierarchical Reinforcement Learning on Dragonfly Networks
Cai, Xuhong
Li, Mo
Shi, Xingyan
Shen, Jiayou
Wu, Chensizhu
Chen, Yi
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 403 - 409
[8] Adaptive Hierarchical Federated Learning Over Wireless Networks
Xu, Bo
Xia, Wenchao
Wen, Wanli
Liu, Pei
Zhao, Haitao
Zhu, Hongbo
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (02) : 2070 - 2083
[9] A hierarchical adaptive federated reinforcement learning for efficient resource allocation and task scheduling in hierarchical IoT network
Sagar, A. S. M. Sharifuzzaman
Haider, Amir
Kim, Hyung Seok
COMPUTER COMMUNICATIONS, 2025, 229
[10] Adaptive Polling in Hierarchical Social Networks Using Blackwell Dominance
Bhat, Sujay
Krishnamurthy, Vikram
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2019, 5 (03): : 538 - 553

← 1 2 3 4 →