Hierarchical reinforcement learning via dynamic subspace search for multi-agent planning

被引:20
作者
Ma, Aaron [1 ]
Ouimet, Michael [2 ]
Cortes, Jorge [1 ]
机构
[1] Univ Calif San Diego, Dept Mech & Aerosp Engn, La Jolla, CA 92093 USA
[2] Naval Informat Warfare Ctr Pacific, San Diego, CA USA
关键词
Reinforcement learning; Multi-agent planning; Distributed robotics; Semi-Markov decision processes; Markov decision processes; Upper confidence bound tree search; Hierarchical planning; Hierarchical Markov decision processes; Model-based reinforcement learning; Swarm robotics; Dynamic domain reduction; Submodularity; POMDPS;
D O I
10.1007/s10514-019-09871-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider scenarios where a swarm of unmanned vehicles (UxVs) seek to satisfy a number of diverse, spatially distributed objectives. The UxVs strive to determine an efficient plan to service the objectives while operating in a coordinated fashion. We focus on developing autonomous high-level planning, where low-level controls are leveraged from previous work in distributed motion, target tracking, localization, and communication. We rely on the use of state and action abstractions in a Markov decision processes framework to introduce a hierarchical algorithm, Dynamic Domain Reduction for Multi-Agent Planning, that enables multi-agent planning for large multi-objective environments. Our analysis establishes the correctness of our search procedure within specific subsets of the environments, termed 'sub-environment' and characterizes the algorithm performance with respect to the optimal trajectories in single-agent and sequential multi-agent deployment scenarios using tools from submodularity. Simulated results show significant improvement over using a standard Monte Carlo tree search in an environment with large state and action spaces.
引用
收藏
页码:485 / 503
页数:19
相关论文
共 38 条
[1]  
Agha-mohammadi AA, 2011, IEEE INT C INT ROBOT, P4284, DOI 10.1109/IROS.2011.6048702
[2]  
[Anonymous], 2008, AAAl
[3]  
[Anonymous], CORR
[4]  
Bai A, 2016, P 25 INT JOINT C ART, P3029
[5]   DYNAMIC PROGRAMMING [J].
BELLMAN, R .
SCIENCE, 1966, 153 (3731) :34-&
[6]   Robust convex optimization [J].
Ben-Tal, A ;
Nemirovski, A .
MATHEMATICS OF OPERATIONS RESEARCH, 1998, 23 (04) :769-805
[7]  
Bertsekas D. P., 2017, Dynamic Programming and Optimal Control, V2
[8]  
Bian A.A., 2017, Proceedings of the 34 th International Conference on Machine Learning, V70, P498
[9]   Approximation algorithms for orienteering and discounted-reward TSP [J].
Blum, Avrim ;
Chawla, Shuchi ;
Karger, David R. ;
Lane, Terran ;
Meyerson, Adam ;
Minkoff, Maria .
SIAM JOURNAL ON COMPUTING, 2007, 37 (02) :653-670
[10]  
Bullo F, 2009, PRINC SER APPL MATH, P1