Aircraft collision avoidance modeling and optimization using deep reinforcement learning

被引:2
作者
Park K.-W. [1 ]
Kim J.-H. [2 ]
机构
[1] AI Lab, Deltax Co., Ltd
关键词
Collision avoidance; Imitation learning; Machine learning; Optimization; Reinforcement learning;
D O I
10.5302/J.ICROS.2021.21.0034
中图分类号
学科分类号
摘要
We propose an imitation-type reinforcement learning approach for aircraft collision avoidance problems. The policy model is initially supervised to learn the collision avoidance strategies based on the domain-knowledge from the flight mechanics and the guidance contexts, and then it is updated and optimized via reinforcement learning and the proximal policy optimization. The performance of the proposed approach was verified via Monte-Carlo simulation runs that contain a wide range of collision geometries. © ICROS 2021.
引用
收藏
页码:652 / 659
页数:7
相关论文
共 12 条
[1]  
Hwang Y.K., Ahuja N., A potential field approach to path planning, IEEE Transactions on Robotics and Automation, 8, 1, (1992)
[2]  
Han S.C., Bang H.C., Proportional navigation-based optimal collision avoidance for UAVs, Journal of Institute of Control, Robotics and Systems (In Korean), 10, 11, pp. 1065-1070, (2004)
[3]  
Chen Y.F., Liu M., Everett M., How J.P., Decentra-Lized Non-Communicating Multiagent Collision Avoidance with Deep Reinforcement Learning
[4]  
Park S.G., Kim D.H., Autonomous flying of drone based on PPO reinforcement learning algorithm, Journal of Institute of Control, Robotics and Systems (In Korean), 26, 11, pp. 955-963
[5]  
Kim M., Kim J., Jung M., Oh H., Collision avoidance for a small drone with monocular camera using deep reinforcement learning in an indoor environment, Journal of Institute of Control, Robotics and Systems (In Korean), 26, 6, pp. 399-411
[6]  
Tesauro G., Practical issues in temporal difference learn-ing, Machine Learning, 8, pp. 257-277, (1992)
[7]  
Mnih V., Badia A.P., Mirza M., Graves A., Lillicarap T.P., Harley T., Silver D., Kavukcuoglu K., Asynchronous methods for deep reinforcement learning, ICML, (2016)
[8]  
Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O., Proximal Policy Optimization Algorithms
[9]  
Oh J., Guo Y., Singh S., Lee H., Self-Imitation Learn-Ing
[10]  
Kostrikov I., Nachum O., Tompson J., Imitation learn-ing via off-policy distribution matching, ICLR, (2020)