Hierarchical Reinforcement Learning for Air Combat at DARPA's AlphaDogfight Trials

被引：22

作者：

Pope A.P. ^{[1
,2
]}

Ide J.S. ^{[1
]}

Mićović D. ^{[1
]}

Diaz H. ^{[1
]}

Twedt J.C. ^{[1
]}

Alcedo K. ^{[1
]}

Walker T.T. ^{[1
]}

Rosenbluth D. ^{[1
]}

Ritholtz L. ^{[1
]}

Javorsek D. ^{[3
]}

机构：

[1] Applied AI Team, Lockheed Martin Artificial Intelligence Center, Shelton, 06484, CT

[2] Primordial Labs, New Haven, 06510, CT

[3] Nellis Air Force Base, United States Air Force, Las Vegas, 89191, NV

来源：

IEEE Transactions on Artificial Intelligence | 2023年 / 4卷 / 06期

关键词：

Air combat; artificial intelligence; autonomy; deep reinforcement learning; hierarchical reinforcement learning;

D O I：

10.1109/TAI.2022.3222143

中图分类号：

学科分类号：

摘要：

Autonomous control in high-dimensional, continuous state spaces is a persistent and important challenge in the fields of robotics and artificial intelligence. Because of high risk and complexity, the adoption of AI for autonomous combat systems has been a long-standing difficulty. In order to address these issues, DARPA's AlphaDogfight Trials (ADT) program sought to vet the feasibility of and increase trust in AI for autonomously piloting an F-16 in simulated air-to-air combat. Our submission to ADT solves the high-dimensional, continuous control problem using a novel hierarchical deep reinforcement learning approach consisting of a high-level policy selector and a set of separately trained low-level policies specialized for excelling in specific regions of the state space. Both levels of the hierarchy are trained using off-policy, maximum entropy methods with expert knowledge integrated through reward shaping. Our approach outperformed human expert pilots and achieved a second-place rank in the ADT championship event. © 2020 IEEE.

引用

页码：1371 / 1385

页数：14

共 69 条

[21]

Dayan P., Hinton G.E., Feudal reinforcement learning, Proc. Adv. Neural Inf. Process. Syst., pp. 271-278, (1993)

[22]

Barto A.G., Mahadevan S., Recent advances in hierarchical reinforcement learning, Discrete Event Dyn. Syst., 13, 1-2, pp. 41-77, (2003)

[23]

Schmidhuber J., Learning to generate subgoals for action sequences, Proc. Seattle Int. Joint Conf. Neural Netw., 2, pp. 967-972, (1991)

[24]

Sutton R.S., Precup D., Singh S., Between MDPS and semi-MDPS: A framework for temporal abstraction in reinforcement learning, Artif. Intell., 112, 1-2, pp. 181-211, (1999)

[25]

Botvinick M.M., Hierarchical reinforcement learning and decision making, Curr. Opin. Neurobiol., 22, 6, pp. 956-962, (2012)

[26]

Comanici G., Precup D., Optimal policy switching algorithms for reinforcement learning, Proc. 9th Int. Conf. Auton. Agents Multiagent Syst., pp. 709-714, (2010)

[27]

Schaul T., Horgan D., Gregor K., Silver D., Universal value function approximators, Proc. 32nd Int. Conf. Mach. Learn., pp. 1312-1320, (2015)

[28]

Bacon P.-L., Harb J., Precup D., The option-critic architecture, Proc. 31st AAAI Conf. Artif. Intell., pp. 1726-1734, (2017)

[29]

Vezhnevets A.S., Et al., Feudal networks for hierarchical reinforcement learning, Proc. 34th Int. Conf. Mach. Learn., pp. 3540-3549, (2017)

[30]

Nachum O., Gu S., Lee H., Levine S., Data-efficient hierarchical reinforcement learning, Proc. Adv. Neural Inf. Process. Syst., pp. 3303-3313, (2018)

← 1 2 3 4 5 6 7 →