Deep Reinforcement Learning Approach to Air Traffic Optimization Using the MuZero Algorithm

被引：0

作者：

Yilmaz, Emre ^{[1
]}

Sanni, Olatunde ^{[1
]}

Herniczek, Mark T. Kotwicz ^{[1
]}

German, Brian J. ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Aerosp Engn, 270 Ferst Dr, Atlanta, GA 30332 USA

来源：

AIAA AVIATION 2021 FORUM | 2021年

关键词：

GO; GAME;

D O I：

暂无

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

Urban Air Mobility (UAM) and Unmanned Aircraft Systems (UAS) are anticipated to result in a significant growth in air traffic that will require novel Air Traffic Management (ATM) solutions to problems such as path planning with separation constraints. In this paper, we apply MuZero, a newly-introduced deep reinforcement learning algorithm by DeepMind [1] to path planning problems in dynamic air traffic environments. MuZero has demonstrated exceptional progress in Artificial Intelligence (AI) game playing. To formulate the path planning problem, we consider a sequential trajectory allocation approach that would act on a "first-come-first-serve" basis for both online planning and moving time horizon problems. Initial results show that agents can learn to mitigate collisions when trained with the obstacle avoidance framework based on the MuZero algorithm without requiring any knowledge about the domain and game rules.

引用

页数：14

共 7 条

[1]

Busoniu L, 2010, STUD COMPUT INTELL, V310, P183

[2]

Mnih V., 2013, ARXIV, V1312, P5602, DOI DOI 10.48550/ARXIV.1312.5602

[3] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

[4] A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play [J].

Silver, David ;

Hubert, Thomas ;

Schrittwieser, Julian ;

Antonoglou, Ioannis ;

Lai, Matthew ;

Guez, Arthur ;

Lanctot, Marc ;

Sifre, Laurent ;

Kumaran, Dharshan ;

Graepel, Thore ;

Lillicrap, Timothy ;

Simonyan, Karen ;

Hassabis, Demis .

SCIENCE, 2018, 362 (6419) :1140-+

[5] Mastering the game of Go without human knowledge [J].

Silver, David ;

Schrittwieser, Julian ;

Simonyan, Karen ;

Antonoglou, Ioannis ;

Huang, Aja ;

Guez, Arthur ;

Hubert, Thomas ;

Baker, Lucas ;

Lai, Matthew ;

Bolton, Adrian ;

Chen, Yutian ;

Lillicrap, Timothy ;

Hui, Fan ;

Sifre, Laurent ;

van den Driessche, George ;

Graepel, Thore ;

Hassabis, Demis .

NATURE, 2017, 550 (7676) :354-+

[6] Mastering the game of Go with deep neural networks and tree search [J].

Silver, David ;

Huang, Aja ;

Maddison, Chris J. ;

Guez, Arthur ;

Sifre, Laurent ;

van den Driessche, George ;

Schrittwieser, Julian ;

Antonoglou, Ioannis ;

Panneershelvam, Veda ;

Lanctot, Marc ;

Dieleman, Sander ;

Grewe, Dominik ;

Nham, John ;

Kalchbrenner, Nal ;

Sutskever, Ilya ;

Lillicrap, Timothy ;

Leach, Madeleine ;

Kavukcuoglu, Koray ;

Graepel, Thore ;

Hassabis, Demis .

NATURE, 2016, 529 (7587) :484-+

[7] Scalable Multi-Agent Computational Guidance with Separation Assurance for Autonomous Urban Air Mobility [J].

Yang, Xuxi ;

Wei, Peng .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2020, 43 (08) :1473-1486

← 1 →