Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field Experiments

被引：23

作者：

Bohn, Eivind ^{[1
]}

Coates, Erlend M. ^{[2
]}

Reinhardt, Dirk ^{[2
]}

Johansen, Tor Arne ^{[2
]}

机构：

[1] Dept Math & Cybernet, SINTEF DIGITAL, N-0373 Oslo, Norway

[2] Norwegian Univ Sci & Technol NTNU, Ctr Autonomous Marine Operat & Syst, Dept Engn Cybernet, N-7491 Trondheim, Norway

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

关键词：

Attitude control; Reinforcement learning; Data models; Computational modeling; Autonomous aerial vehicles; Aircraft; Vehicle dynamics; autonomous aerial vehicles; deep reinforcement learning (DRL); sim-to-real; soft actor-critic (SAC); QUADROTOR;

D O I：

10.1109/TNNLS.2023.3263430

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Attitude control of fixed-wing unmanned aerial vehicles (UAVs) is a difficult control problem in part due to uncertain nonlinear dynamics, actuator constraints, and coupled longitudinal and lateral motions. Current state-of-the-art autopilots are based on linear control and are thus limited in their effectiveness and performance. drl is a machine learning method to automatically discover optimal control laws through interaction with the controlled system that can handle complex nonlinear dynamics. We show in this article that deep reinforcement learning (DRL) can successfully learn to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics, requiring as little as 3 min of flight data. We initially train our model in a simulation environment and then deploy the learned controller on the UAV in flight tests, demonstrating comparable performance to the state-of-the-art ArduPlane proportional-integral-derivative (PID) attitude controller with no further online learning required. Learning with significant actuation delay and diversified simulated dynamics were found to be crucial for successful transfer to control of the real UAV. In addition to a qualitative comparison with the ArduPlane autopilot, we present a quantitative assessment based on linear analysis to better understand the learning controller's behavior.

引用

页码：3168 / 3180

页数：13

共 47 条

[1] Learning dexterous in-hand manipulation [J].

Andrychowicz, Marcin ;

Baker, Bowen ;

Chociej, Maciek ;

Jozefowicz, Rafal ;

McGrew, Bob ;

Pachocki, Jakub ;

Petron, Arthur ;

Plappert, Matthias ;

Powell, Glenn ;

Ray, Alex ;

Schneider, Jonas ;

Sidor, Szymon ;

Tobin, Josh ;

Welinder, Peter ;

Weng, Lilian ;

Zaremba, Wojciech .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) :3-20

[2]

Andrychowicz Marcin, 2017, Advances in neural information processing systems, V30

[3]

Anglade A, 2019, IEEE DECIS CONTR P, P5880, DOI 10.1109/CDC40024.2019.9029274

[4]

[Anonymous], ARDUPILOT OP SOURC D

[5] Drone Deep Reinforcement Learning: A Review [J].

Azar, Ahmad Taher ;

Koubaa, Anis ;

Ali Mohamed, Nada ;

Ibrahim, Habiba A. ;

Ibrahim, Zahra Fathy ;

Kazim, Muhammad ;

Ammar, Adel ;

Benjdira, Bilel ;

Khamis, Alaa M. ;

Hameed, Ibrahim A. ;

Casalino, Gabriella .

ELECTRONICS, 2021, 10 (09)

[6]

Beard R.W., 2012, SMALL UNMANNED AIRCR

[7] A Few Lessons Learned in Reinforcement Learning for Quadcopter Attitude Control [J].

Bernini, Nicola ;

Bessa, Mikhail ;

Delmas, Remi ;

Gold, Arthur ;

Goubault, Eric ;

Pennec, Romain ;

Putot, Sylvie ;

Sillion, Francois .

HSCC2021: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON HYBRID SYSTEMS: COMPUTATION AND CONTROL (PART OF CPS-IOT WEEK), 2021,

[8]

Bohn E., 2021, PROJECT CODE DETAILS

[9]

Bohn E., 2022, Reinforcement learning for optimization of nonlinear and predictive control

[10]

Bohn E, 2019, INT CONF UNMAN AIRCR, P523, DOI [10.1109/icuas.2019.8798254, 10.1109/ICUAS.2019.8798254]

← 1 2 3 4 5 →