Hierarchical reinforcement learning via dynamic subspace search for multi-agent planning

被引：20

作者：

Ma, Aaron ^{[1
]}

Ouimet, Michael ^{[2
]}

Cortes, Jorge ^{[1
]}

机构：

[1] Univ Calif San Diego, Dept Mech & Aerosp Engn, La Jolla, CA 92093 USA

[2] Naval Informat Warfare Ctr Pacific, San Diego, CA USA

来源：

AUTONOMOUS ROBOTS | 2020年 / 44卷 / 3-4期

关键词：

Reinforcement learning; Multi-agent planning; Distributed robotics; Semi-Markov decision processes; Markov decision processes; Upper confidence bound tree search; Hierarchical planning; Hierarchical Markov decision processes; Model-based reinforcement learning; Swarm robotics; Dynamic domain reduction; Submodularity; POMDPS;

D O I：

10.1007/s10514-019-09871-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider scenarios where a swarm of unmanned vehicles (UxVs) seek to satisfy a number of diverse, spatially distributed objectives. The UxVs strive to determine an efficient plan to service the objectives while operating in a coordinated fashion. We focus on developing autonomous high-level planning, where low-level controls are leveraged from previous work in distributed motion, target tracking, localization, and communication. We rely on the use of state and action abstractions in a Markov decision processes framework to introduce a hierarchical algorithm, Dynamic Domain Reduction for Multi-Agent Planning, that enables multi-agent planning for large multi-objective environments. Our analysis establishes the correctness of our search procedure within specific subsets of the environments, termed 'sub-environment' and characterizes the algorithm performance with respect to the optimal trajectories in single-agent and sequential multi-agent deployment scenarios using tools from submodularity. Simulated results show significant improvement over using a standard Monte Carlo tree search in an environment with large state and action spaces.

引用

页码：485 / 503

页数：19

共 38 条

[1]

Agha-mohammadi AA, 2011, IEEE INT C INT ROBOT, P4284, DOI 10.1109/IROS.2011.6048702

[2]

[Anonymous], 2008, AAAl

[3]

[Anonymous], CORR

[4]

Bai A, 2016, P 25 INT JOINT C ART, P3029

[5] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[6] Robust convex optimization [J].

Ben-Tal, A ;

Nemirovski, A .

MATHEMATICS OF OPERATIONS RESEARCH, 1998, 23 (04) :769-805

[7]

Bertsekas D. P., 2017, Dynamic Programming and Optimal Control, V2

[8]

Bian A.A., 2017, Proceedings of the 34 th International Conference on Machine Learning, V70, P498

[9] Approximation algorithms for orienteering and discounted-reward TSP [J].

Blum, Avrim ;

Chawla, Shuchi ;

Karger, David R. ;

Lane, Terran ;

Meyerson, Adam ;

Minkoff, Maria .

SIAM JOURNAL ON COMPUTING, 2007, 37 (02) :653-670

[10]

Bullo F, 2009, PRINC SER APPL MATH, P1

← 1 2 3 4 →