Formation Control With Collision Avoidance Through Deep Reinforcement Learning Using Model-Guided Demonstration

被引：66

作者：

Sui, Zezhi ^{[1
,2
]}

Pu, Zhiqiang ^{[1
,2
,3
]}

Yi, Jianqiang ^{[1
,2
,3
]}

Wu, Shiguang ^{[1
,2
]}

机构：

[1] Chinese Acad Sci, Inst Automat, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[3] Taizhou Inst Intelligent Mfg, Taizhou 225300, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 06期

关键词：

Collision avoidance; Training; Maintenance engineering; Machine learning; Multi-agent systems; Task analysis; deep reinforcement learning (DRL); formation control; leader-follower; FOLLOWER FORMATION CONTROL; MOBILE ROBOTS; ENVIRONMENT; CONSENSUS; VEHICLES; SYSTEMS;

D O I：

10.1109/TNNLS.2020.3004893

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Generating collision-free, time-efficient paths in an uncertain dynamic environment poses huge challenges for the formation control with collision avoidance (FCCA) problem in a leader-follower structure. In particular, the followers have to take both formation maintenance and collision avoidance into account simultaneously. Unfortunately, most of the existing works are simple combinations of methods dealing with the two problems separately. In this article, a new method based on deep reinforcement learning (RL) is proposed to solve the problem of FCCA. Especially, the learning-based policy is extended to the field of formation control, which involves a two-stage training framework: an imitation learning (IL) and later an RL. In the IL stage, a model-guided method consisting of a consensus theory-based formation controller and an optimal reciprocal collision avoidance strategy is designed to speed up training and increase efficiency. In the RL stage, a compound reward function is presented to guide the training. In addition, we design a formation-oriented network structure to perceive the environment. Long short-term memory is adopted to enable the network structure to perceive the information of obstacles of an uncertain number, and a transfer training approach is adopted to improve the generalization of the network in different scenarios. Numerous representative simulations are conducted, and our method is further deployed to an experimental platform based on a multiomnidirectional-wheeled car system. The effectiveness and practicability of our proposed method are validated through both the simulation and experiment results.

引用

页码：2358 / 2372

页数：15

共 49 条

[1] Type-2 fuzzy ontology-based semantic knowledge for collision avoidance of autonomous underwater vehicles
Ali, Farman
Kim, Eun Kyoung
Kim, Yong-Gi
[J]. INFORMATION SCIENCES, 2015, 295 : 441 - 464
[2] Alonso-Mora J, 2013, SPRINGER TRAC ADV RO, V83, P203
[3] Alonso-Mora J, 2012, IEEE INT CONF ROBOT, P360, DOI 10.1109/ICRA.2012.6225166
[4] [Anonymous], 2013, Playing atari with deep reinforcement learning
[5] Behavior-based formation control for multirobot teams
Balch, T
Arkin, RC
[J]. IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1998, 14 (06): : 926 - 939
[6] Generalized reciprocal collision avoidance
Bareiss, Daman
van den Berg, Jur
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2015, 34 (12) : 1501 - 1514
[7] Chen C., 2018, CROWD ROBOT INTERACT
[8] Chen YF, 2017, IEEE INT C INT ROBOT, P1343, DOI 10.1109/IROS.2017.8202312
[9] Leader-follower formation control of nonholonomic mobile robots with input constraints
Consolini, Luca
Morbidi, Fabio
Prattichizzo, Domenico
Tosques, Mario
[J]. AUTOMATICA, 2008, 44 (05) : 1343 - 1349
[10] Mutual Information-Based Multi-AUV Path Planning for Scalar Field Sampling Using Multidimensional RRT*
Cui, Rongxin
Li, Yang
Yan, Weisheng
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2016, 46 (07): : 993 - 1004

← 1 2 3 4 5 →