Multi-Robot Learning Dynamic Obstacle Avoidance in Formation With Information-Directed Exploration

被引：7

作者：

Cao, Junjie ^{[1
]}

Wang, Yujie ^{[2
]}

Liu, Yong ^{[1
]}

Ni, Xuesong ^{[3
]}

机构：

[1] Zhejiang Univ, Inst Cyber Syst & Control, Hangzhou 310058, Peoples R China

[2] Zhejiang Univ, Interdisciplinary Ctr Social Sci ICSS, Hangzhou 310058, Peoples R China

[3] Beijing Electromech Engn Inst, Beijing 100074, Peoples R China

来源：

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2022年 / 6卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Robots; Collision avoidance; Uncertainty; Neural networks; Robot kinematics; Bayes methods; Reinforcement learning; Multi-Robot formation; dynamic obstacle avoidance; reinforcement learning; exploration;

D O I：

10.1109/TETCI.2021.3127925

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper presents an algorithm that generates distributed collision-free velocities for multi-robot while maintain formation as much as possible. The adaptive formation problem is cast as a sequential decision-making problem, which is solved using reinforcement learning that trains several distributed policies to avoid dynamic obstacles on the top of consensus velocities. We construct the policy with Bayesian Linear Regression based on a neural network (called BNL) to compute the state-action value uncertainty efficiently for sequential decision making. The information-directed sampling is applied in our BNL policy to achieve efficient exploration. By further combining the distributional reinforcement learning, we can estimate the intrinsic uncertainty of the state-action value globally and more accurately. For continuous control tasks, efficient exploration can be achieved by optimizing a policy with the sampled action value function from a BNL model. Through our experiments in some contextual Bandit and sequential decision-making tasks, we show that exploration with the BNL model has improved efficiency in both computation and training samples. By augmenting the consensus velocities with our BNL policy, experiments on Multi-Robot navigation demonstrate that adaptive formation is achieved.

引用

页码：1357 / 1367

页数：11

共 52 条

[1]

[Anonymous], 2011, Advances in neural information processing systems

[2]

[Anonymous], 2014, Advances in Neural Information Processing Systems

[3]

[Anonymous], 2013, P ADV NEURAL INFORM

[4] Finite-time analysis of the multiarmed bandit problem [J].

Auer, P ;

Cesa-Bianchi, N ;

Fischer, P .

MACHINE LEARNING, 2002, 47 (2-3) :235-256

[5]

Azizzadenesheli K, 2018, 2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA)

[6]

Bellemare MG, 2017, PR MACH LEARN RES, V70

[7]

Bishop C.M., 2006, Pattern Recognition and Machine Learning, DOI DOI 10.1007/978-0-387-45528-0

[8] THE VECTOR FIELD HISTOGRAM - FAST OBSTACLE AVOIDANCE FOR MOBILE ROBOTS [J].

BORENSTEIN, J ;

KOREN, Y .

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1991, 7 (03) :278-288

[9]

Cap M., 2013, Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems, P1263

[10]

Chen R. Y., 2017, ARXIV170601502

← 1 2 3 4 5 6 →