Docking Control of an Autonomous Underwater Vehicle Using Reinforcement Learning

被引:46
作者
Anderlini, Enrico [1 ]
Parker, Gordon G. [2 ]
Thomas, Giles [1 ]
机构
[1] UCL, Dept Mech Engn, London WC1E 7JE, England
[2] Michigan Technol Univ, Dept Mech Engn Engn Mech, Houghton, MI 49931 USA
来源
APPLIED SCIENCES-BASEL | 2019年 / 9卷 / 17期
关键词
autonomous underwater vehicle; reinforcement learning; optimal control; LEVEL CONTROL; RETRIEVAL; SYSTEM;
D O I
10.3390/app9173456
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
To achieve persistent systems in the future, autonomous underwater vehicles (AUVs) will need to autonomously dock onto a charging station. Here, reinforcement learning strategies were applied for the first time to control the docking of an AUV onto a fixed platform in a simulation environment. Two reinforcement learning schemes were investigated: one with continuous state and action spaces, deep deterministic policy gradient (DDPG), and one with continuous state but discrete action spaces, deep Q network (DQN). For DQN, the discrete actions were selected as step changes in the control input signals. The performance of the reinforcement learning strategies was compared with classical and optimal control techniques. The control actions selected by DDPG suffer from chattering effects due to a hyperbolic tangent layer in the actor. Conversely, DQN presents the best compromise between short docking time and low control effort, whilst meeting the docking requirements. Whereas the reinforcement learning algorithms present a very high computational cost at training time, they are five orders of magnitude faster than optimal control at deployment time, thus enabling an on-line implementation. Therefore, reinforcement learning achieves a performance similar to optimal control at a much lower computational cost at deployment, whilst also presenting a more general framework.
引用
收藏
页数:24
相关论文
共 55 条
[1]  
Agostini MJ, 2002, J INTEL MAT SYST STR, V13, P181, DOI 10.1106/104538902028703
[2]   Generating swing-suppressed maneuvers for crane systems with rate saturation [J].
Agostini, MJ ;
Parker, GG ;
Schaub, H ;
Groom, K ;
Robinett, RD .
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2003, 11 (04) :471-481
[3]   Retrieval of an autonomous underwater vehicle: An interception approach [J].
Ahmad, SM ;
Sutton, R ;
Burns, RS .
UNDERWATER TECHNOLOGY, 2003, 25 (04) :185-197
[4]   Control of a ROV carrying an object [J].
Anderlini, Enrico ;
Parker, Gordon G. ;
Thomas, Giles .
OCEAN ENGINEERING, 2018, 165 :307-318
[5]   Control of a Realistic Wave Energy Converter Model Using Least-Squares Policy Iteration [J].
Anderlini, Enrico ;
Forehand, David I. M. ;
Bannon, Elva ;
Abusara, Mohammad .
IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2017, 8 (04) :1618-1628
[6]   Control of a Point Absorber Using Reinforcement Learning [J].
Anderlini, Enrico ;
Forehand, David I. M. ;
Stansell, Paul ;
Xiao, Qing ;
Abusara, Mohammad .
IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2016, 7 (04) :1681-1690
[7]  
[Anonymous], 2015, STUDENTS GUIDE NUMER
[8]  
[Anonymous], IEEE T SYST MAN CYBE
[9]  
[Anonymous], P 36 CHIN CONTR C DA
[10]  
[Anonymous], 2018, P 35 INT C MACH LEAR