Depth Control of Model-Free AUVs via Reinforcement Learning

被引:100
作者
Wu, Hui [1 ,2 ]
Song, Shiji [1 ,2 ]
You, Keyou [1 ,2 ]
Wu, Cheng [1 ,2 ]
机构
[1] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
[2] Tsinghua Univ, TNList, Beijing 100084, Peoples R China
来源
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2019年 / 49卷 / 12期
基金
美国国家科学基金会;
关键词
Autonomous underwater vehicle (AUV); depth control; deterministic policy gradient (DPG); neural network; prioritized experience replay; reinforcement learning (RL); UNDERWATER VEHICLE; SYSTEMS;
D O I
10.1109/TSMC.2017.2785794
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider depth control problems of an autonomous underwater vehicle (AUV) for tracking the desired depth trajectories. Due to the unknown dynamical model and the coupling between surge and yaw motions of the AUV, the problems cannot be effectively solved by most of the model-based or proportional-integral-derivative like controllers. To this purpose, we formulate the depth control problems of the AUV as continuous-state, continuous-action Markov decision processes under unknown transition probabilities. Based on the deterministic policy gradient theorem and neural network approximation, we propose a model-free reinforcement learning (RL) algorithm that learns a state-feedback controller from sampled trajectories of the AUV. To improve the performance of the RL algorithm, we further propose a batch-learning scheme through replaying previous prioritized trajectories. We illustrate with simulations that our model-free method is even comparable to the model-based controllers. Moreover, we validate the effectiveness of the proposed RL algorithm on a seafloor data set sampled from the South China Sea.
引用
收藏
页码:2499 / 2510
页数:12
相关论文
共 29 条
[1]  
Abbeel P., 2007, Advances in neural information processing systems, V19, P1
[2]  
AI X, 2016, I C CONT AUTOMAT ROB, P1
[3]  
[Anonymous], 2003, Ph.D. thesis
[4]  
[Anonymous], 2014, ICML ICML 14
[5]  
[Anonymous], 1998, REINFORCEMENT LEARNI
[6]  
Budiyono A, 2011, INDIAN J GEO-MAR SCI, V40, P191
[7]   Saturation based nonlinear depth and yaw control of underwater vehicles with stability analysis and real-time experiments [J].
Campos, E. ;
Chemori, A. ;
Creuze, V. ;
Torres, J. ;
Lozano, R. .
MECHATRONICS, 2017, 45 :49-59
[8]  
Chen X.-H., 2011, P IEEE OCEANS 11 C W, P1
[9]   On the track keeping and roll reduction of the ship in random waves using different sliding mode controllers [J].
Fang, Ming-Chung ;
Luo, Jhih-Hong .
OCEAN ENGINEERING, 2007, 34 (3-4) :479-488
[10]   Chemical plume tracing via an autonomous underwater vehicle [J].
Farrell, JA ;
Pang, S ;
Li, W .
IEEE JOURNAL OF OCEANIC ENGINEERING, 2005, 30 (02) :428-442