A Deterministic Improved Q-Learning for Path Planning of a Mobile Robot

被引:179
作者
Konar, Amit [1 ]
Chakraborty, Indrani Goswami [1 ]
Singh, Sapam Jitu [1 ]
Jain, Lakhmi C. [2 ]
Nagar, Atulya K. [3 ]
机构
[1] Jadavpur Univ, Dept Elect & Telecommun Engn, Kolkata 700032, India
[2] Univ S Australia, Adelaide, SA 5000, Australia
[3] Liverpool Hope Univ, Liverpool L16 9JD, Merseyside, England
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2013年 / 43卷 / 05期
关键词
Agent; mobile robots; path planning; Q-learning; reinforcement learning; ALGORITHM;
D O I
10.1109/TSMCA.2012.2227719
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper provides a new deterministic Q-learning with a presumed knowledge about the distance from the current state to both the next state and the goal. This knowledge is efficiently used to update the entries in the Q-table once only by utilizing four derived properties of the Q-learning, instead of repeatedly updating them like the classical Q-learning. Naturally, the proposed algorithm has an insignificantly small time complexity in comparison to its classical counterpart. Furthermore, the proposed algorithm stores the Q-value for the best possible action at a state and thus saves significant storage. Experiments undertaken on simulated maze and real platforms confirm that the Q-table obtained by the proposed Q-learning when used for the path-planning application of mobile robots outperforms both the classical and the extended Q-learning with respect to three metrics: traversal time, number of states traversed, and 90 degrees turns required. The reduction in 90 degrees turnings minimizes the energy consumption and thus has importance in the robotics literature.
引用
收藏
页码:1141 / 1153
页数:13
相关论文
共 38 条
  • [1] [Anonymous], 1993, MACHINE LEARNING MET
  • [2] [Anonymous], 2010, P 6 INT WIRELESS COM, DOI DOI 10.1145/1815396.1815448
  • [3] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
  • [4] A MINIMUM-TIME TRAJECTORY PLANNING METHOD FOR 2 ROBOTS
    BIEN, ZN
    LEE, JH
    [J]. IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1992, 8 (03): : 414 - 418
  • [5] Busoniu L, 2010, AUTOM CONTROL ENG SE, P1, DOI 10.1201/9781439821091-f
  • [6] Cooperative multi-robot path planning using differential evolution
    Chakraborty, Jayasree
    Konar, Amit
    Jain, L. C.
    Chakraborty, Uday K.
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2009, 20 (1-2) : 13 - 27
  • [7] Hybrid Q-learning Algorithm About Cooperation in MAS
    Chen, Wei
    Guo, Jing
    Li, Xiong
    Wang, Jie
    [J]. CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 3943 - 3947
  • [8] Chen Z, 2011, IEEE SOUTHEASTCON, P409, DOI 10.1109/SECON.2011.5752976
  • [9] A production technique for a Q-table with an influence map for speeding up Q-learning
    Cho, Kyungeun
    Sung, Yunsick
    Um, Kyhyun
    [J]. 2007 INTERNATIONAL CONFERENCE ON INTELLIGENT PERVASIVE COMPUTING, PROCEEDINGS, 2007, : 72 - +
  • [10] The knowledge gradient policy for offline learning with independent normal rewards
    Frazier, Peter
    Powell, Warren
    [J]. 2007 IEEE INTERNATIONAL SYMPOSIUM ON APPROXIMATE DYNAMIC PROGRAMMING AND REINFORCEMENT LEARNING, 2007, : 143 - +