Differentiable Learning of Scalable Multi-Agent Navigation Policies

被引：4

作者：

Ye, Xiaohan ^{[1
,2
]}

Pan, Zherong ^{[1
]}

Gao, Xifeng

Wu, Kui

Ren, Bo ^{[2
]}

机构：

[1] Tencent, LightSpeed Studios, Shenzhen 518054, Peoples R China

[2] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2023年 / 8卷 / 04期

关键词：

Navigation; Task analysis; Heuristic algorithms; Trajectory; Training; Kernel; Mathematical models; Multi-robot systems; robotics and automation; swarm robotics;

D O I：

10.1109/LRA.2023.3248440

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

We present an end-to-end differentiable learning algorithm for multi-agent navigation policies. Compared with prior model-free learning algorithms, our method leads to a significant speedup via the gradient information. Our key innovation lies in a novel differentiability analysis of the optimization-based crowd simulation algorithm via the implicit function theorem. Inspired by continuum multi-agent modeling techniques, we further propose a kernel-based policy parameterization, allowing our learned policy to scale up to an arbitrary number of agents without re-training. We evaluate our algorithm on two tasks in obstacle-rich environments, partially labeled navigation and evacuation, for which loss functions can be defined making the entire task learnable in an end-to-end manner. The results show that our method can achieve more than one order of magnitude speedup over model-free baselines and readily scale to unseen target configurations and agent sizes.

引用

页码：2229 / 2236

页数：8

共 49 条

[11] Fedkiw R, 2001, COMP GRAPH, P15, DOI 10.1145/383259.383260
[12] Interior methods for nonlinear optimization
Forsgren, A
Gill, PE
Wright, MH
[J]. SIAM REVIEW, 2002, 44 (04) : 525 - 597
[13] Godoy J, 2016, AAAI CONF ARTIF INTE, P2487
[14] Grover J.S., 2021, PROC INT WORKSHOP AL, P294
[15] VR-ORCA: Variable Responsibility Optimal Reciprocal Collision Avoidance
Guo, Ke
Wang, Dawei
Fan, Tingxiang
Pan, Jia
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) : 4520 - 4527
[16] Haarnoja T, 2018, PR MACH LEARN RES, V80
[17] Asynchronous Contact Mechanics
Harmon, David
Vouga, Etienne
Smith, Breannan
Tamstorf, Rasmus
Grinspun, Eitan
[J]. ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03):
[18] Defensive Escort Teams for Navigation in Crowds via Multi-Agent Deep Reinforcement Learning
Hasan, Yazied A.
Garg, Arpit
Sugaya, Satomi
Tapia, Lydia
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 5645 - 5652
[19] Ho J, 2016, ADV NEUR IN, V29
[20] Hu YM, 2019, IEEE INT CONF ROBOT, P6265, DOI [10.1109/ICRA.2019.8794333, 10.1109/icra.2019.8794333]

← 1 2 3 4 5 →