Automatic Construction of Markov Decision Process Models for Multi-Agent Reinforcement Learning

被引：1

作者：

Young, Darrell L. ^{[1
]}

Eccles, Chris ^{[1
]}

机构：

[1] Raytheon Intelligence & Space, 22210 Pacific Blvd, Sterling, VA 20166 USA

来源：

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II | 2020年 / 11413卷

关键词：

multi-agent reinforcement learning; graph; communication protocol; policy; training; Markov Decision Process;

D O I：

10.1117/12.2557823

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper describes our current multi-agent reinforcement learning concepts to complement or replace classic operational planning techniques. A neural planner is used to generate many possible paths. Training of the neural planner is a one-time task using a physics-based model to create the training data. The outputs of the neural planner are achievable paths. The path intersections are represented as decision waypoint nodes in a graph. The graph is interpreted as a Markov Decision Process (MDP). The resulting MDP is much faster than non-discretized spaces to train multi-agent reinforcement algorithms because only high-level decision waypoints are considered. The technique is applicable to multiple domains including air, space, land, sea, and cyber-physical domains.

引用

页数：14

共 24 条

[1] Agarwal A., 2019, LEARNING TRANSFERABL
[2] Agarwal A., 2019, ARXIV190601202 ARXIV190601202
[3] [Anonymous], 2017, C WORKSH NEUR INF PR
[4] [Anonymous], 1994, P 11 INT C INT C MAC
[5] [Anonymous], 2016, A concise introduction to decentralized POMDPs
[6] OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks
Boeing, Geoff
[J]. COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2017, 65 : 126 - 139
[7] Chen C., 2019, ARXIV PREPRINT ARXIV
[8] Chen Y., 2019, 18 INT C AUT AG MULT, P1395
[9] Hohmann-Hohmann and Hohmann-Phasing Cooperative Rendezvous Maneuvers
Dutta, Atri
Tsiotras, Panagiotis
[J]. JOURNAL OF THE ASTRONAUTICAL SCIENCES, 2009, 57 (1-2) : 393 - 417
[10] garwal A., 2019, P 18 INT C AUT AG MU P 18 INT C AUT AG MU

← 1 2 3 →