Intelligent Model Learning Based on Variance for Bayesian Reinforcement Learning

被引：0

作者：

You, Shuhua ^{[1
]}

Liu, Quan ^{[2
]}

Zhang, Zongzhang ^{[1
]}

Wang, Hui ^{[1
]}

Zhang, Xiaofang ^{[1
]}

机构：

[1] Soochow Univ, Sch Comp Sci & Technol, Suzhou, Peoples R China

[2] Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Seoul, South Korea

来源：

2015 IEEE 27TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2015) | 2015年

关键词：

reinforcement learning; Bayesian dynamic programming; model learning; policy learning; Dirichlet distributions;

D O I：

10.1109/ICTAI.2015.37

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider a modular method to reinforcement learning that represents uncertainty of model parameters by maintaining probability distributions over them. The algorithm we call MBDP (model-based Bayesian dynamic programming) can be decomposed into two parallel types of inference: model learning and policy learning. During learning a model, we update posterior distributions of a model over observations after taking an action in each state. During learning a policy, we solve MDPs by dynamic programming with greedy approximation to make an agent choose behaviors which maximize return under the estimated model. Furthermore, we propose a principled method which utilizes the variance of Dirichlet distributions for determining when to learn and relearn the model. We demonstrate that MBDP can find near optimal policies with high probability by sufficient model learning and experimental results show that MBDP performs better compared with current state-of-the-art methods in reinforcement learning.

引用

页码：170 / 177

页数：8

共 50 条

[21] BAYESIAN REINFORCEMENT LEARNING FOR POMDP-BASED DIALOGUE SYSTEMS [J].

Png, ShaoWei ;

Pineau, Joelle .

2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, :2156-2159

[22] A parallel framework for Bayesian reinforcement learning [J].

Barrett, Enda ;

Duggan, Jim ;

Howley, Enda .

CONNECTION SCIENCE, 2014, 26 (01) :7-23

[23] TRAINABLE, BAYESIAN SYMMETRIES FOR REINFORCEMENT LEARNING [J].

Lu, Qingmei .

PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER THEORY AND ENGINEERING (ICACTE 2009), VOLS 1 AND 2, 2009, :1079-1086

[24] Cover Tree Bayesian Reinforcement Learning [J].

Tziortziotis, Nikolaos ;

Dimitrakakis, Christos ;

Blekas, Konstantinos .

JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 :2313-2335

[25] Sellers' Pricing By Bayesian Reinforcement Learning [J].

Han, Wei .

2009 INTERNATIONAL CONFERENCE ON E-BUSINESS AND INFORMATION SYSTEM SECURITY, VOLS 1 AND 2, 2009, :1276-1280

[26] Bayesian reinforcement learning: A basic overview [J].

Kang, Pyungwon ;

Tobler, Philippe N. ;

Dayan, Peter .

NEUROBIOLOGY OF LEARNING AND MEMORY, 2024, 211

[27] Bayesian reinforcement learning reliability analysis [J].

Zhou, Tong ;

Guo, Tong ;

Dang, Chao ;

Beer, Michael .

COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2024, 424

[28] Intelligent Path Planning of Underwater Robot Based on Reinforcement Learning [J].

Yang, Jiachen ;

Ni, Jingfei ;

Xi, Meng ;

Wen, Jiabao ;

Li, Yang .

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2023, 20 (03) :1983-1996

[29] Intelligent guidance for no⁃fly zone avoidance based on reinforcement learning [J].

Hui J. ;

Wang R. ;

Guo J. .

Hangkong Xuebao/Acta Aeronautica et Astronautica Sinica, 2023, 44 (11)

[30] Development of bus intelligent dispatching system based on reinforcement learning [J].

Zou, L ;

Xu, LM ;

Zhu, LX .

ICEMI 2005: CONFERENCE PROCEEDINGS OF THE SEVENTH INTERNATIONAL CONFERENCE ON ELECTRONIC MEASUREMENT & INSTRUMENTS, VOL 2, 2005, :372-376

← 1 2 3 4 5 →