A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot

被引：191

作者：

Konar, Amit ^{[1
]}

Chakraborty, Indrani Goswami ^{[1
]}

Singh, Sapam Jitu ^{[1
]}

Jain, Lakhmi C. ^{[2
]}

Nagar, Atulya K. ^{[3
]}

机构：

[1] Jadavpur Univ, Dept Elect & Telecommun Engn, Kolkata 700032, India

[2] Univ S Australia, Adelaide, SA 5000, Australia

[3] Liverpool Hope Univ, Liverpool L16 9JD, Merseyside, England

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2013年 / 43卷 / 05期

关键词：

Agent; mobile robots; path planning; Q-learning; reinforcement learning; ALGORITHM;

D O I：

10.1109/TSMCA.2012.2227719

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper provides a new deterministic Q-learning with a presumed knowledge about the distance from the current state to both the next state and the goal. This knowledge is efficiently used to update the entries in the Q-table once only by utilizing four derived properties of the Q-learning, instead of repeatedly updating them like the classical Q-learning. Naturally, the proposed algorithm has an insignificantly small time complexity in comparison to its classical counterpart. Furthermore, the proposed algorithm stores the Q-value for the best possible action at a state and thus saves significant storage. Experiments undertaken on simulated maze and real platforms confirm that the Q-table obtained by the proposed Q-learning when used for the path-planning application of mobile robots outperforms both the classical and the extended Q-learning with respect to three metrics: traversal time, number of states traversed, and 90 degrees turns required. The reduction in 90 degrees turnings minimizes the energy consumption and thus has importance in the robotics literature.

引用

页码：1141 / 1153

页数：13

共 38 条

[1]

[Anonymous], 1993, MACHINE LEARNING MET

[2]

[Anonymous], 2010, P 6 INT WIRELESS COM, DOI DOI 10.1145/1815396.1815448

[3]

Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics

[4] A MINIMUM-TIME TRAJECTORY PLANNING METHOD FOR 2 ROBOTS [J].

BIEN, ZN ;

LEE, JH .

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1992, 8 (03) :414-418

[5]

Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f

[6] Cooperative multi-robot path planning using differential evolution [J].

Chakraborty, Jayasree ;

Konar, Amit ;

Jain, L. C. ;

Chakraborty, Uday K. .

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2009, 20 (1-2) :13-27

[7] Hybrid Q-learning Algorithm About Cooperation in MAS [J].

Chen, Wei ;

Guo, Jing ;

Li, Xiong ;

Wang, Jie .

CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, :3943-3947

[8]

Chen Z, 2011, IEEE SOUTHEASTCON, P409, DOI 10.1109/SECON.2011.5752976

[9] A production technique for a Q-table with an influence map for speeding up Q-learning [J].

Cho, Kyungeun ;

Sung, Yunsick ;

Um, Kyhyun .

2007 INTERNATIONAL CONFERENCE ON INTELLIGENT PERVASIVE COMPUTING, PROCEEDINGS, 2007, :72-+

[10] The knowledge gradient policy for offline learning with independent normal rewards [J].

Frazier, Peter ;

Powell, Warren .

2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, :143-+

← 1 2 3 4 →