Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning

被引：6

作者：

Wilson, Callum ^{[1
]}

Riccardi, Annalisa ^{[1
]}

机构：

[1] Univ Strathclyde, Mech & Aerosp Engn, 75 Montrose St, Glasgow G1 1XQ, Lanark, Scotland

来源：

OPTIMIZATION AND ENGINEERING | 2023年 / 24卷 / 01期

关键词：

Intelligent Control; Reinforcement Learning; Spacecraft Powered Descent; FEEDBACK GUIDANCE; ATTITUDE-CONTROL; NEURAL-NETWORKS; SYSTEMS; ENTRY;

D O I：

10.1007/s11081-021-09687-z

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Reinforcement learning entails many intuitive and useful approaches to solving various problems. Its main premise is to learn how to complete tasks by interacting with the environment and observing which actions are more optimal with respect to a reward signal. Methods from reinforcement learning have long been applied in aerospace and have more recently seen renewed interest in space applications. Problems in spacecraft control can benefit from the use of intelligent techniques when faced with significant uncertainties-as is common for space environments. Solving these control problems using reinforcement learning remains a challenge partly due to long training times and sensitivity in performance to hyperparameters which require careful tuning. In this work we seek to address both issues for a sample spacecraft control problem. To reduce training times compared to other approaches, we simplify the problem by discretising the action space and use a data-efficient algorithm to train the agent. Furthermore, we employ an automated approach to hyperparameter selection which optimises for a specified performance metric. Our approach is tested on a 3-DOF powered descent problem with uncertainties in the initial conditions. We run experiments with two different problem formulations-using a 'shaped' state representation to guide the agent and also a 'raw' state representation with unprocessed values of position, velocity and mass. The results show that an agent can learn a near-optimal policy efficiently by appropriately defining the action-space and state-space. Using the raw state representation led to 'reward-hacking' and poor performance, which highlights the importance of the problem and state-space formulation in successfully training reinforcement learning agents. In addition, we show that the optimal hyperparameters can vary significantly based on the choice of loss function. Using two sets of hyperparameters optimised for different loss functions, we demonstrate that in both cases the agent can find near-optimal policies with comparable performance to previously applied methods.

引用

页码：223 / 255

页数：33

共 50 条

[1] Improving the efficiency of reinforcement learning for a spacecraft powered descent with Q-learning
Callum Wilson
Annalisa Riccardi
Optimization and Engineering, 2023, 24 : 223 - 255
[2] Deep Reinforcement Learning: From Q-Learning to Deep Q-Learning
Tan, Fuxiao
Yan, Pengfei
Guan, Xinping
NEURAL INFORMATION PROCESSING (ICONIP 2017), PT IV, 2017, 10637 : 475 - 483
[3] Fuzzy Q-Learning for generalization of reinforcement learning
Berenji, HR
FUZZ-IEEE '96 - PROCEEDINGS OF THE FIFTH IEEE INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS, VOLS 1-3, 1996, : 2208 - 2214
[4] Deep Reinforcement Learning with Double Q-Learning
van Hasselt, Hado
Guez, Arthur
Silver, David
THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2094 - 2100
[5] Reinforcement learning guidance law of Q-learning
Zhang Q.
Ao B.
Zhang Q.
Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2020, 42 (02): : 414 - 419
[6] Feasible Q-Learning for Average Reward Reinforcement Learning
Jin, Ying
Blanchet, Jose
Gummadi, Ramki
Zhou, Zhengyuan
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[7] Mildly Conservative Q-Learning for Offline Reinforcement Learning
Lyu, Jiafei
Ma, Xiaoteng
Li, Xiu
Lu, Zongqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[8] Adaptable Conservative Q-Learning for Offline Reinforcement Learning
Qiu, Lyn
Li, Xu
Liang, Lenghan
Sun, Mingming
Yan, Junchi
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT III, 2024, 14427 : 200 - 212
[9] Reinforcement distribution in fuzzy Q-learning
Bonarini, Andrea
Lazaric, Alessandro
Montrone, Francesco
Restelli, Marcello
FUZZY SETS AND SYSTEMS, 2009, 160 (10) : 1420 - 1443
[10] An Efficient Hardware Implementation of Reinforcement Learning: The Q-Learning Algorithm
Spano, Sergio
Cardarilli, Gian Carlo
Di Nunzio, Luca
Fazzolari, Rocco
Giardino, Daniele
Matta, Marco
Nannarelli, Alberto
Re, Marco
IEEE ACCESS, 2019, 7 : 186340 - 186351

← 1 2 3 4 5 →