Model-Based Reinforcement Learning and Neural-Network-Based Policy Compression for Spacecraft Rendezvous on Resource-Constrained Embedded Systems

被引:7
作者
Yang, Zhibin [1 ]
Xing, Linquan [1 ]
Gu, Zonghua [2 ]
Xiao, Yingmin [1 ]
Zhou, Yong [1 ]
Huang, Zhiqiu [1 ]
Xue, Lei [3 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Sch Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Umea Univ, Dept Appl Phys & Elect, S-90187 Umea, Sweden
[3] Shanghai Aerosp Elect Technol Inst, Shanghai 201100, Peoples R China
基金
中国国家自然科学基金;
关键词
Space vehicles; Artificial neural networks; Mathematical models; Vehicle dynamics; Reinforcement learning; Predictive models; Computational modeling; Formal verification; Markov decision process (MDP); model-based reinforcement learning; spacecraft rendezvous guidance; DYNAMICS;
D O I
10.1109/TII.2022.3192085
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Autonomous spacecraft rendezvous is very challenging in increasingly complex space missions. In this article, we present our approach model-based reinforcement learning for spacecraft rendezvous guidance (MBRL4SRG). We build a Markov decision process model based on the Clohessy-Wiltshire equation of spacecraft dynamics and use dynamic programming to solve it and generate the decision table as the optimal agent policy. Since the onboard computing system of spacecraft is resource constrained in terms of both memory size and processing speed, we train a neural network (NN) as a compact and efficient function approximation to the tabular representation of the decision table. The NN outputs are formally verified using the verification tool ReluVal, and the verification results show that the robustness of the NN is maintained. Experimental results indicate that MBRL4SRG achieves lower computational overhead than the conventional proportional-integral-derivative algorithm and has higher trustworthiness and better computational efficiency during training than the model-free reinforcement learning algorithms.
引用
收藏
页码:1107 / 1116
页数:10
相关论文
共 30 条
  • [1] Optimal Rendezvous Trajectories of a Controlled Spacecraft and a Tumbling Object
    Boyarko, George
    Yakimenko, Oleg
    Romano, Marcello
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2011, 34 (04) : 1239 - 1252
  • [2] Broida J, 2019, ADV ASTRONAUT SCI, V168, P1777
  • [3] Indirect robust adaptive fault-tolerant control for attitude tracking of spacecraft
    Cai, Wenchuan
    Liao, X. H.
    Song, Y. D.
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2008, 31 (05) : 1456 - 1463
  • [4] Model-Free Emergency Frequency Control Based on Reinforcement Learning
    Chen, Chunyu
    Cui, Mingjian
    Li, Fangxing
    Yin, Shengfei
    Wang, Xinan
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (04) : 2336 - 2346
  • [5] Egorov M, 2017, J MACH LEARN RES, V18
  • [6] Federici L., 2021, PROC AIAA SCITECH FO
  • [7] Gaudet B., 2018, AAS AIAA ASTR SPEC C, P813
  • [8] Deep Reinforcement Learning for Spacecraft Proximity Operations Guidance
    Hovell, Kirk
    Ulrich, Steve
    [J]. JOURNAL OF SPACECRAFT AND ROCKETS, 2021, 58 (02) : 254 - 264
  • [9] Shifting Deep Reinforcement Learning Algorithm Toward Training Directly in Transient Real-World Environment: A Case Study in Powertrain Control
    Hu, Bo
    Li, Jiaxi
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (12) : 8198 - 8206
  • [10] Guaranteeing Safety for Neural Network-Based Aircraft Collision Avoidance Systems
    Julian, Kyle D.
    Kochenderfer, Mykel J.
    [J]. 2019 IEEE/AIAA 38TH DIGITAL AVIONICS SYSTEMS CONFERENCE (DASC), 2019,