Automatic Construction of Markov Decision Process Models for Multi-Agent Reinforcement Learning

被引:1
作者
Young, Darrell L. [1 ]
Eccles, Chris [1 ]
机构
[1] Raytheon Intelligence & Space, 22210 Pacific Blvd, Sterling, VA 20166 USA
来源
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS II | 2020年 / 11413卷
关键词
multi-agent reinforcement learning; graph; communication protocol; policy; training; Markov Decision Process;
D O I
10.1117/12.2557823
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes our current multi-agent reinforcement learning concepts to complement or replace classic operational planning techniques. A neural planner is used to generate many possible paths. Training of the neural planner is a one-time task using a physics-based model to create the training data. The outputs of the neural planner are achievable paths. The path intersections are represented as decision waypoint nodes in a graph. The graph is interpreted as a Markov Decision Process (MDP). The resulting MDP is much faster than non-discretized spaces to train multi-agent reinforcement algorithms because only high-level decision waypoints are considered. The technique is applicable to multiple domains including air, space, land, sea, and cyber-physical domains.
引用
收藏
页数:14
相关论文
共 24 条
  • [1] Agarwal A., 2019, LEARNING TRANSFERABL
  • [2] Agarwal A., 2019, ARXIV190601202 ARXIV190601202
  • [3] [Anonymous], 2017, C WORKSH NEUR INF PR
  • [4] [Anonymous], 1994, P 11 INT C INT C MAC
  • [5] [Anonymous], 2016, A concise introduction to decentralized POMDPs
  • [6] OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks
    Boeing, Geoff
    [J]. COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2017, 65 : 126 - 139
  • [7] Chen C., 2019, ARXIV PREPRINT ARXIV
  • [8] Chen Y., 2019, 18 INT C AUT AG MULT, P1395
  • [9] Hohmann-Hohmann and Hohmann-Phasing Cooperative Rendezvous Maneuvers
    Dutta, Atri
    Tsiotras, Panagiotis
    [J]. JOURNAL OF THE ASTRONAUTICAL SCIENCES, 2009, 57 (1-2) : 393 - 417
  • [10] garwal A., 2019, P 18 INT C AUT AG MU P 18 INT C AUT AG MU