Hybrid MDP based integrated hierarchical Q-learning

被引：16

作者：

Chen ChunLin ^{[2
,3
]}

Dong DaoYi ^{[1
,4
]}

Li Han-Xiong ^{[5
]}

Tarn, Tzyh-Jong ^{[6
]}

机构：

[1] Zhejiang Univ, State Key Lab Ind Control Technol, Inst Cyber Syst & Control, Hangzhou 310027, Zhejiang, Peoples R China

[2] Nanjing Univ, State Key Lab Novel Software Technol, Nanjing 210093, Peoples R China

[3] Nanjing Univ, Dept Control & Syst Engn, Nanjing 210093, Peoples R China

[4] Univ New S Wales, Australian Def Force Acad, Sch Engn & Informat Technol, Canberra, ACT 2600, Australia

[5] City Univ Hong Kong, Dept Mfg Engn & Engn Management, Hong Kong 999077, Hong Kong, Peoples R China

[6] Washington Univ, Dept Elect & Syst Engn, St Louis, MO 63130 USA

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2011年 / 54卷 / 11期

基金：

澳大利亚研究理事会; 中国国家自然科学基金;

关键词：

reinforcement learning; hierarchical Q-learning; hybrid MDP; temporal abstraction; REINFORCEMENT; NAVIGATION; ROBOT; ALGORITHMS; BEHAVIOR; SYSTEMS;

D O I：

10.1007/s11432-011-4332-6

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a widely used reinforcement learning method, Q-learning is bedeviled by the curse of dimensionality: The computational complexity grows dramatically with the size of state-action space. To combat this difficulty, an integrated hierarchical Q-learning framework is proposed based on the hybrid Markov decision process (MDP) using temporal abstraction instead of the simple MDP. The learning process is naturally organized into multiple levels of learning, e.g., quantitative (lower) level and qualitative (upper) level, which are modeled as MDP and semi-MDP (SMDP), respectively. This hierarchical control architecture constitutes a hybrid MDP as the model of hierarchical Q-learning, which bridges the two levels of learning. The proposed hierarchical Q-learning can scale up very well and speed up learning with the upper level learning process. Hence this approach is an effective integral learning and control scheme for complex problems. Several experiments are carried out using a puzzle problem in a gridworld environment and a navigation control problem for a mobile robot. The experimental results demonstrate the effectiveness and efficiency of the proposed approach.

引用

页码：2279 / 2294

页数：16

共 37 条

[1] Smooth task switching through behaviour competition [J].

Althaus, P ;

Christensen, HI .

ROBOTICS AND AUTONOMOUS SYSTEMS, 2003, 44 (3-4) :241-249

[2] Recent Advances in Hierarchical Reinforcement Learning [J].

Andrew G. Barto ;

Sridhar Mahadevan .

Discrete Event Dynamic Systems, 2003, 13 (4) :341-379

[3] Qualitative and quantitative simulation: bridging the gap [J].

Berleant, D ;

Kuipers, BJ .

ARTIFICIAL INTELLIGENCE, 1997, 95 (02) :215-255

[4]

BERTSEKAS DP, 1996, NEURODYNAMIC PROGRAM, P36

[5] Quantum computation for action selection using reinforcement learning [J].

C. L. Chen ;

D. Y. Dong ;

Z. H. Chen .

INTERNATIONAL JOURNAL OF QUANTUM INFORMATION, 2006, 4 (06) :1071-1083

[6] Hybrid control for robot navigation - A hierarchical Q-learning algorithm [J].

Chen, Chunlin ;

Li, Han-Xiong ;

Dong, Daoyi .

IEEE ROBOTICS & AUTOMATION MAGAZINE, 2008, 15 (02) :37-47

[7]

Chen Chunlin, 2005, Journal of Systems Engineering and Electronics, V16, P611

[8]

Chen CL, 2006, LECT NOTES COMPUT SC, V3959, P399

[9]

Chen CL, 2010, INT J INNOV COMPUT I, V6, P789

[10] Advances in automation and control research in China [J].

Cheng DaiZhan .

SCIENCE IN CHINA SERIES F-INFORMATION SCIENCES, 2009, 52 (11) :1954-1963

← 1 2 3 4 →