Model-Based Reinforcement Learning and Neural-Network-Based Policy Compression for Spacecraft Rendezvous on Resource-Constrained Embedded Systems

被引：7

作者：

Yang, Zhibin ^{[1
]}

Xing, Linquan ^{[1
]}

Gu, Zonghua ^{[2
]}

Xiao, Yingmin ^{[1
]}

Zhou, Yong ^{[1
]}

Huang, Zhiqiu ^{[1
]}

Xue, Lei ^{[3
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 210016, Peoples R China

[2] Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden

[3] Shanghai Aerosp Elect Technol Inst, Shanghai 201100, Peoples R China

来源：

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS | 2023年 / 19卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Space vehicles; Artificial neural networks; Mathematical models; Vehicle dynamics; Reinforcement learning; Predictive models; Computational modeling; Formal verification; Markov decision process (MDP); model-based reinforcement learning; spacecraft rendezvous guidance; DYNAMICS;

D O I：

10.1109/TII.2022.3192085

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Autonomous spacecraft rendezvous is very challenging in increasingly complex space missions. In this article, we present our approach model-based reinforcement learning for spacecraft rendezvous guidance (MBRL4SRG). We build a Markov decision process model based on the Clohessy-Wiltshire equation of spacecraft dynamics and use dynamic programming to solve it and generate the decision table as the optimal agent policy. Since the onboard computing system of spacecraft is resource constrained in terms of both memory size and processing speed, we train a neural network (NN) as a compact and efficient function approximation to the tabular representation of the decision table. The NN outputs are formally verified using the verification tool ReluVal, and the verification results show that the robustness of the NN is maintained. Experimental results indicate that MBRL4SRG achieves lower computational overhead than the conventional proportional-integral-derivative algorithm and has higher trustworthiness and better computational efficiency during training than the model-free reinforcement learning algorithms.

引用

页码：1107 / 1116

页数：10

共 30 条

[1] Optimal Rendezvous Trajectories of a Controlled Spacecraft and a Tumbling Object
Boyarko, George
Yakimenko, Oleg
Romano, Marcello
[J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2011, 34 (04) : 1239 - 1252
[2] Broida J, 2019, ADV ASTRONAUT SCI, V168, P1777
[3] Indirect robust adaptive fault-tolerant control for attitude tracking of spacecraft
Cai, Wenchuan
Liao, X. H.
Song, Y. D.
[J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2008, 31 (05) : 1456 - 1463
[4] Model-Free Emergency Frequency Control Based on Reinforcement Learning
Chen, Chunyu
Cui, Mingjian
Li, Fangxing
Yin, Shengfei
Wang, Xinan
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (04) : 2336 - 2346
[5] Egorov M, 2017, J MACH LEARN RES, V18
[6] Federici L., 2021, PROC AIAA SCITECH FO
[7] Gaudet B., 2018, AAS AIAA ASTR SPEC C, P813
[8] Deep Reinforcement Learning for Spacecraft Proximity Operations Guidance
Hovell, Kirk
Ulrich, Steve
[J]. JOURNAL OF SPACECRAFT AND ROCKETS, 2021, 58 (02) : 254 - 264
[9] Shifting Deep Reinforcement Learning Algorithm Toward Training Directly in Transient Real-World Environment: A Case Study in Powertrain Control
Hu, Bo
Li, Jiaxi
[J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (12) : 8198 - 8206
[10] Guaranteeing Safety for Neural Network-Based Aircraft Collision Avoidance Systems
Julian, Kyle D.
Kochenderfer, Mykel J.
[J]. 2019 IEEE/AIAA 38TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2019,

← 1 2 3 →