Orbital Interception Pursuit Strategy for Random Evasion Using Deep Reinforcement Learning

被引：15

作者：

Jiang, Rui ^{[1
]}

Ye, Dong ^{[1
]}

Xiao, Yan ^{[1
]}

Sun, Zhaowei ^{[1
]}

Zhang, Zeming ^{[1
,2
]}

机构：

[1] Harbin Inst Technol, Sch Astronaut, Res Ctr Satellite Technol, Harbin, Peoples R China

[2] Politecn Milan, Dept Aerosp Sci & Technol, Space Missions Engn Lab, Milan, Italy

来源：

SPACE: SCIENCE & TECHNOLOGY | 2023年 / 3卷

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning;

D O I：

10.34133/space.0086

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

Aiming at the interception problem of noncooperative evader spacecraft adopting random maneuver strategy in one-to-one orbital pursuit-evasion problem, an interception strategy with decision-making training mechanism for the pursuer based on deep reinforcement learning is proposed. Its core purpose is to improve the success rate of interception in the environment with high uncertainty. First of all, a multi-impulse orbit transfer model of pursuer and evader is established, and a modular deep reinforcement learning training method is built. Second, an effective reward mechanism is proposed to train the pursuer to choose the impulse direction and impulse interval of the orbit transfer and to learn the successful interception strategy with the optimal fuel and time. Finally, with the evader taking a random maneuver decision in each episode of training, the trained decision-making strategy is applied to the pursuer, the corresponding interception success rate of which is further analyzed. The results show that the pursuer trained can obtain universal and variable interception strategy. In each round of pursuit-evasion, with random maneuver strategy of the evader, the pursuer can adopt similar optimal decisions to deal with high-dimensional environments and thoroughly random state space, maintaining high interception success rate.

引用

页数：14

共 23 条

[1] [陈统 CHEN Tong], 2006, [宇航学报, Journal of Chinese Society of Astronautics], V27, P416
[2] Prioritizing Postdisaster Recovery of Transportation Infrastructure Systems Using Multiagent Reinforcement Learning
Ghannad, Pedram
Lee, Yong-Cheol
Choi, Jin Ouk
[J]. JOURNAL OF MANAGEMENT IN ENGINEERING, 2021, 37 (01)
[3] Robust control of spacecraft formation flying
Hu, Yan-Ru
Ng, Alfred
[J]. JOURNAL OF AEROSPACE ENGINEERING, 2007, 20 (04) : 209 - 214
[4] Jingrui Z, 2023, Automatica, V148
[5] Li YL, 2018, The attack orbit optimization of space attack and defense
[6] Liu SN, 2018, Mission planning and orbit optimization of multisatellite interception
[7] Liu YB, 2020, Acta Aeronaut Astronaut Sin, V41, P348
[8] Prospects for multi-agent collaboration and gaming: challenge, technology, and application
Liu, Yu
Li, Zhi
Jiang, Zhizhuo
He, You
[J]. FRONTIERS OF INFORMATION TECHNOLOGY & ELECTRONIC ENGINEERING, 2022, 23 (07) : 1002 - 1009
[9] [卢山 Lu Shan], 2008, [中国空间科学技术, Chinese Space Science and Technology], V28, P7
[10] Optimal path planning approach based on Q-learning algorithm for mobile robots
Maoudj, Abderraouf
Hentout, Abdelfetah
[J]. APPLIED SOFT COMPUTING, 2020, 97

← 1 2 3 →