A mission planning method for deep space detectors using deep reinforcement learning

被引：4

作者：

Qi, Yuheng ^{[1
]}

Gu, Defeng ^{[1
]}

Liu, Yuan ^{[2
]}

Zhu, Jubo ^{[1
]}

Wang, Jian ^{[3
,4
]}

Liu, Daoping ^{[3
,4
]}

机构：

[1] Sun Yat Sen Univ, Sch Artificial Intelligence, Zhuhai 519082, Peoples R China

[2] Sun Yat Sen Univ, Sch Aeronaut & Astronaut, Shenzhen 518000, Peoples R China

[3] Sun Yat sen Univ, TianQin Res Ctr Gravitat Phys, Zhuhai 519082, Peoples R China

[4] Sun Yat sen Univ, Sch Phys & Astron, Zhuhai 519082, Peoples R China

来源：

AEROSPACE SCIENCE AND TECHNOLOGY | 2024年 / 153卷

基金：

国家重点研发计划;

关键词：

Mission planning; Deep space detector; Scheduling; Deep reinforcement learning (DRL); Markov decision;

D O I：

10.1016/j.ast.2024.109417

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

Mission planning for deep space detectors is a pivotal step in the successful execution of detection missions. Traditional planning approaches, which typically compartmentalize mission planning and data transmission scheduling, exhibit limitations in adapting to uncertainties and are often inadequate in responding to unforeseen opportunity targets. This paper introduces a mission planning method for deep space detectors designed to address these challenges. Firstly, a Markov Decision Process (MDP) model is formulated, integrating mission planning and data transmission scheduling, with a consideration for planning balance between detection missions and opportunity targets. Subsequently, a Planning Balance with Proximal Policy Optimization (PB-PPO) algorithm is proposed. The proposed algorithm, based in the Proximal Policy Optimization (PPO) algorithm, integrates an orthogonal initialization algorithm to afford improved control over parameter updates. Furthermore, a dynamic learning rate strategy is implemented to accelerate convergence speed. Experimental results show that, the PBPPO achieves rewards that are 4.42%, 6.78%, 18.32%, and 26.81% higher than those obtained by compared algorithms. Additionally, the PB-PPO demonstrates the capability to address the planning balance between detection missions and opportunity targets, ensuring stable reward growth even when planning a significant number of opportunity targets. In summary, the PB-PPO integrates of MDP, deep reinforcement learning, and innovative strategies to make it a robust solution for detector mission planning in the complex and dynamic environment of deep space.

引用

页数：11

共 46 条

[1] Essential Technologies and Concepts for Massive Space Exploration: Challenges and Opportunities [J].

Arzo, Sisay Tadesse ;

Sikeridis, Dimitrios ;

Devetsikiotis, Michael ;

Granelli, Fabrizio ;

Fierro, Rafael ;

Esmaeili, Mona ;

Akhavan, Zeinab .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (01) :3-29

[2]

Bajenaru V., 2023, Trustworthy Reinforcement Learning for Decentralized Control of Satellites

[3] Reinforcement learning with prior policy guidance for motion planning of dual-arm free-floating space robot [J].

Cao, Yuxue ;

Wang, Shengjie ;

Zheng, Xiang ;

Ma, Wenke ;

Xie, Xinru ;

Liu, Lei .

AEROSPACE SCIENCE AND TECHNOLOGY, 2023, 136

[4] Multi-strategy fusion differential evolution algorithm for UAV path planning in complex environment [J].

Chai, Xuzhao ;

Zheng, Zhishuai ;

Xiao, Junming ;

Yan, Li ;

Qu, Boyang ;

Wen, Pengwei ;

Wang, Haoyu ;

Zhou, You ;

Sun, Hang .

AEROSPACE SCIENCE AND TECHNOLOGY, 2022, 121

[5] Solving dynamic satellite image data downlink scheduling problem via an adaptive bi-objective optimization algorithm [J].

Chang, Zhongxiang ;

Punnen, Abraham P. ;

Zhou, Zhongbao ;

Cheng, Shi .

COMPUTERS & OPERATIONS RESEARCH, 2023, 160

[6] Design and analysis of a growable artificial gravity space habitat [J].

Chen, Muhao ;

Goyal, Raman ;

Majji, Manoranjan ;

Skelton, Robert E. .

AEROSPACE SCIENCE AND TECHNOLOGY, 2020, 106

[7] Event-Triggered Deep Reinforcement Learning for Dynamic Task Scheduling in Multisatellite Resource Allocation [J].

Cui, Kaixin ;

Song, Jiliang ;

Zhang, Lei ;

Tao, Ying ;

Liu, Wei ;

Shi, Dawei .

IEEE TRANSACTIONS ON AEROSPACE AND ELECTRONIC SYSTEMS, 2023, 59 (04) :3766-3777

[8]

Engstrom L, 2020, Arxiv, DOI arXiv:2005.12729

[9] Continuous monitoring scheduling for moving targets by Earth observation satellites [J].

Han, Xiaofeng ;

Yang, Ming ;

Wang, Songyan ;

Chao, Tao .

AEROSPACE SCIENCE AND TECHNOLOGY, 2023, 140

[10] Generation of Spacecraft Operations Procedures Using Deep Reinforcement Learning [J].

Harris, Andrew ;

Valade, Trace ;

Teil, Thibaud ;

Schaub, Hanspeter .

JOURNAL OF SPACECRAFT AND ROCKETS, 2022, 59 (02) :611-626

← 1 2 3 4 5 →