An Experience Aggregative Reinforcement Learning With Multi-Attribute Decision-Making for Obstacle Avoidance of Wheeled Mobile Robot

被引:4
作者
Hu, Chunyang [1 ]
Ning, Bin [1 ]
Xu, Meng [2 ]
Gu, Qiong [1 ]
机构
[1] Hubei Univ Arts & Sci, Sch Comp Engn, Xiangyang 441053, Peoples R China
[2] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
关键词
Task analysis; Collision avoidance; Reinforcement learning; Decision making; Mobile robots; Training; experience aggregation; multi-attribute decision-making; obstacle avoidance; wheeled mobile robot; ALGORITHM;
D O I
10.1109/ACCESS.2020.3001143
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A variety of reinforcement learning (RL) methods are developed to achieve the motion control for the robotic systems, which has been a hot issue. However, the performance of the conventional RL methods often encounters a bottleneck, because the robots have difficulty in choosing an appropriate action in the control task due to the exploration-exploitation dilemma. To address this problem and improve the learning performance, this work introduces an experience aggregative reinforcement learning method with a Multi-Attribute Decision-Making (MADM) to achieve the real-time obstacle avoidance of wheeled mobile robot (WMR). The proposed method employs an experience aggregation method to cluster experiential samples and it can achieve more effective experience storage. Moreover, to achieve the effective action selection using the prior experience, an action selection policy based on a Multi-Attribute Decision-Making is proposed. Inspired by the hierarchical decision-making, this work decomposes the original obstacle avoidance task into two sub-tasks using a divide-and-conquer approach. Each sub-task is trained individually by a double Q-learning using a simple reward function. Each sub-task learns an action policy, which enables the sub-task to selects an appropriate action to achieve a single goal. The standardized rewards of sub-tasks are calculated when fusing these sub-tasks to eliminate differences in rewards for sub-tasks. Then, the proposed method integrates the prior experience of three trained sub-tasks via an action policy based on a MADM to complete the source task. Simulation results show that the proposed method outperforms competitors.
引用
收藏
页码:108179 / 108190
页数:12
相关论文
共 43 条
[1]   Experience Replay for Real-Time Reinforcement Learning Control [J].
Adam, Sander ;
Busoniu, Lucian ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (02) :201-212
[2]  
Al-Hmouz R, 2004, PROCEEDINGS OF THE 2004 INTELLIGENT SENSORS, SENSOR NETWORKS & INFORMATION PROCESSING CONFERENCE, P241
[3]   Influence Maximization Based Global Structural Properties: A Multi-Armed Bandit Approach [J].
Alshahrani, Mohammed ;
Zhu Fuxi ;
Sameh, Ahmed ;
Mekouar, Soufiana ;
Liu, Sichao .
IEEE ACCESS, 2019, 7 :69707-69747
[4]  
[Anonymous], 2015, 2015 IEEE TRUSTCOMBI
[5]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[6]  
Calvo R, 2003, IEEE IJCNN, P1340
[7]   Multiattribute decision making based on interval-valued intuitionistic fuzzy values and linear programming methodology [J].
Chen, Shyi-Ming ;
Huang, Zhi-Cheng .
INFORMATION SCIENCES, 2017, 381 :341-351
[8]   Multi-Agent Deep Reinforcement Learning for Large-Scale Traffic Signal Control [J].
Chu, Tianshu ;
Wang, Jie ;
Codeca, Lara ;
Li, Zhaojian .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2020, 21 (03) :1086-1095
[9]  
Craye C, 2016, 2016 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2016), P4877, DOI 10.1109/IROS.2016.7759716
[10]   Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning [J].
Dilokthanakul, Nat ;
Kaplanis, Christos ;
Pawlowski, Nick ;
Shanahan, Murray .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) :3409-3418