Deep reinforcement learning for six degree-of-freedom planetary landing

被引：146

作者：

Gaudet, Brian ^{[1
]}

Linares, Richard ^{[2
]}

Furfaro, Roberto ^{[1
]}

机构：

[1] Univ Arizona, Dept Syst & Ind Engn, Tucson, AZ 85721 USA

[2] MIT, Dept Aeronaut & Astronaut, Cambridge, MA 02139 USA

来源：

ADVANCES IN SPACE RESEARCH | 2020年 / 65卷 / 07期

关键词：

Reinforcement learning; Mars landing; Integrated guidance and control; Artificial intelligence; Autonomous maneuvers; DESCENT;

D O I：

10.1016/j.asr.2019.12.030

中图分类号：

V [航空、航天];

学科分类号：

08 ; 0825 ;

摘要：

This work develops a deep reinforcement learning based approach for Six Degree-of-Freedom (DOF) planetary powered descent and landing. Future Mars missions will require advanced guidance, navigation, and control algorithms for the powered descent phase to target specific surface locations and achieve pinpoint accuracy (landing error ellipse <5 m radius). This requires both a navigation system capable of estimating the lander's state in real-time and a guidance and control system that can map the estimated lander state to a commanded thrust for each lander engine. In this paper, we present a novel integrated guidance and control algorithm designed by applying the principles of reinforcement learning theory. The latter is used to learn a policy mapping the lander's estimated state directly to a commanded thrust for each engine, resulting in accurate and almost fuel-optimal trajectories over a realistic deployment ellipse. Specifically, we use proximal policy optimization, a policy gradient method, to learn the policy. Another contribution of this paper is the use of different discount rates for terminal and shaping rewards, which significantly enhances optimization performance. We present simulation results demonstrating the guidance and control system's performance in a 6-DOF simulation environment and demonstrate robustness to noise and system parameter uncertainty. (C) 2020 COSPAR. Published by Elsevier Ltd. All rights reserved.

引用

页码：1723 / 1741

页数：19

共 37 条

[21]

Ng A.Y., 2000, PROC INT C MACHINE L

[22] Algorithm 902: GPOPS, A MATLAB Software for Solving Multiple-Phase Optimal Control Problems Using the Gauss Pseudospectral Method [J].

Rao, Anil V. ;

Benson, David A. ;

Darby, Christopher ;

Patterson, Michael A. ;

Francolin, Camila ;

Sanders, Ilyssa ;

Huntington, Geoffrey T. .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2010, 37 (02)

[23]

Ross S., 2011, JMLR WORKSHOP C P, P627

[24] Real-Time Optimal Control via Deep Neural Networks: Study on Landing Problems [J].

Sanchez-Sanchez, Carlos ;

Izzo, Dario .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2018, 41 (05) :1122-1135

[25] Enabling the Future: Crowdsourced 3D-printed Prosthetics as a Model for Open Source Assistive Technology Innovation and Mutual Aid [J].

Schull, Jon .

ASSETS'15: PROCEEDINGS OF THE 17TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS & ACCESSIBILITY, 2015, :1-1

[26]

Schulman J, 2017, arXiv

[27]

Schulman J, 2015, PR MACH LEARN RES, V37, P1889

[28] Phoenix - the first Mars Scout mission [J].

Shotwell, R .

ACTA ASTRONAUTICA, 2005, 57 (2-8) :121-134

[29]

SHUSTER MD, 1993, J ASTRONAUT SCI, V41, P439

[30]

Silver D, 2014, PR MACH LEARN RES, V32

← 1 2 3 4 →