Hierarchical Reinforcement Learning for Air Combat at DARPA's AlphaDogfight Trials

被引:22
作者
Pope A.P. [1 ,2 ]
Ide J.S. [1 ]
Mićović D. [1 ]
Diaz H. [1 ]
Twedt J.C. [1 ]
Alcedo K. [1 ]
Walker T.T. [1 ]
Rosenbluth D. [1 ]
Ritholtz L. [1 ]
Javorsek D. [3 ]
机构
[1] Applied AI Team, Lockheed Martin Artificial Intelligence Center, Shelton, 06484, CT
[2] Primordial Labs, New Haven, 06510, CT
[3] Nellis Air Force Base, United States Air Force, Las Vegas, 89191, NV
来源
IEEE Transactions on Artificial Intelligence | 2023年 / 4卷 / 06期
关键词
Air combat; artificial intelligence; autonomy; deep reinforcement learning; hierarchical reinforcement learning;
D O I
10.1109/TAI.2022.3222143
中图分类号
学科分类号
摘要
Autonomous control in high-dimensional, continuous state spaces is a persistent and important challenge in the fields of robotics and artificial intelligence. Because of high risk and complexity, the adoption of AI for autonomous combat systems has been a long-standing difficulty. In order to address these issues, DARPA's AlphaDogfight Trials (ADT) program sought to vet the feasibility of and increase trust in AI for autonomously piloting an F-16 in simulated air-to-air combat. Our submission to ADT solves the high-dimensional, continuous control problem using a novel hierarchical deep reinforcement learning approach consisting of a high-level policy selector and a set of separately trained low-level policies specialized for excelling in specific regions of the state space. Both levels of the hierarchy are trained using off-policy, maximum entropy methods with expert knowledge integrated through reward shaping. Our approach outperformed human expert pilots and achieved a second-place rank in the ADT championship event. © 2020 IEEE.
引用
收藏
页码:1371 / 1385
页数:14
相关论文
共 69 条
[21]  
Dayan P., Hinton G.E., Feudal reinforcement learning, Proc. Adv. Neural Inf. Process. Syst., pp. 271-278, (1993)
[22]  
Barto A.G., Mahadevan S., Recent advances in hierarchical reinforcement learning, Discrete Event Dyn. Syst., 13, 1-2, pp. 41-77, (2003)
[23]  
Schmidhuber J., Learning to generate subgoals for action sequences, Proc. Seattle Int. Joint Conf. Neural Netw., 2, pp. 967-972, (1991)
[24]  
Sutton R.S., Precup D., Singh S., Between MDPS and semi-MDPS: A framework for temporal abstraction in reinforcement learning, Artif. Intell., 112, 1-2, pp. 181-211, (1999)
[25]  
Botvinick M.M., Hierarchical reinforcement learning and decision making, Curr. Opin. Neurobiol., 22, 6, pp. 956-962, (2012)
[26]  
Comanici G., Precup D., Optimal policy switching algorithms for reinforcement learning, Proc. 9th Int. Conf. Auton. Agents Multiagent Syst., pp. 709-714, (2010)
[27]  
Schaul T., Horgan D., Gregor K., Silver D., Universal value function approximators, Proc. 32nd Int. Conf. Mach. Learn., pp. 1312-1320, (2015)
[28]  
Bacon P.-L., Harb J., Precup D., The option-critic architecture, Proc. 31st AAAI Conf. Artif. Intell., pp. 1726-1734, (2017)
[29]  
Vezhnevets A.S., Et al., Feudal networks for hierarchical reinforcement learning, Proc. 34th Int. Conf. Mach. Learn., pp. 3540-3549, (2017)
[30]  
Nachum O., Gu S., Lee H., Levine S., Data-efficient hierarchical reinforcement learning, Proc. Adv. Neural Inf. Process. Syst., pp. 3303-3313, (2018)