Differentiable Learning of Scalable Multi-Agent Navigation Policies

被引:4
作者
Ye, Xiaohan [1 ,2 ]
Pan, Zherong [1 ]
Gao, Xifeng
Wu, Kui
Ren, Bo [2 ]
机构
[1] Tencent, LightSpeed Studios, Shenzhen 518054, Peoples R China
[2] Nankai Univ, Coll Comp Sci, Tianjin 300350, Peoples R China
关键词
Navigation; Task analysis; Heuristic algorithms; Trajectory; Training; Kernel; Mathematical models; Multi-robot systems; robotics and automation; swarm robotics;
D O I
10.1109/LRA.2023.3248440
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
We present an end-to-end differentiable learning algorithm for multi-agent navigation policies. Compared with prior model-free learning algorithms, our method leads to a significant speedup via the gradient information. Our key innovation lies in a novel differentiability analysis of the optimization-based crowd simulation algorithm via the implicit function theorem. Inspired by continuum multi-agent modeling techniques, we further propose a kernel-based policy parameterization, allowing our learned policy to scale up to an arbitrary number of agents without re-training. We evaluate our algorithm on two tasks in obstacle-rich environments, partially labeled navigation and evacuation, for which loss functions can be defined making the entire task learnable in an end-to-end manner. The results show that our method can achieve more than one order of magnitude speedup over model-free baselines and readily scale to unseen target configurations and agent sizes.
引用
收藏
页码:2229 / 2236
页数:8
相关论文
共 49 条
  • [11] Fedkiw R, 2001, COMP GRAPH, P15, DOI 10.1145/383259.383260
  • [12] Interior methods for nonlinear optimization
    Forsgren, A
    Gill, PE
    Wright, MH
    [J]. SIAM REVIEW, 2002, 44 (04) : 525 - 597
  • [13] Godoy J, 2016, AAAI CONF ARTIF INTE, P2487
  • [14] Grover J.S., 2021, PROC INT WORKSHOP AL, P294
  • [15] VR-ORCA: Variable Responsibility Optimal Reciprocal Collision Avoidance
    Guo, Ke
    Wang, Dawei
    Fan, Tingxiang
    Pan, Jia
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) : 4520 - 4527
  • [16] Haarnoja T, 2018, PR MACH LEARN RES, V80
  • [17] Asynchronous Contact Mechanics
    Harmon, David
    Vouga, Etienne
    Smith, Breannan
    Tamstorf, Rasmus
    Grinspun, Eitan
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2009, 28 (03):
  • [18] Defensive Escort Teams for Navigation in Crowds via Multi-Agent Deep Reinforcement Learning
    Hasan, Yazied A.
    Garg, Arpit
    Sugaya, Satomi
    Tapia, Lydia
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 5645 - 5652
  • [19] Ho J, 2016, ADV NEUR IN, V29
  • [20] Hu YM, 2019, IEEE INT CONF ROBOT, P6265, DOI [10.1109/ICRA.2019.8794333, 10.1109/icra.2019.8794333]