MPC-based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles

被引:20
作者
Cai, Wenqi [1 ]
Kordabad, Arash B. [1 ]
Esfahani, Hossein N. [1 ]
Lekkas, Anastasios M. [1 ]
Gros, Sebastien [1 ]
机构
[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway
来源
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2021年
关键词
D O I
10.1109/CDC45484.2021.9683750
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this work, we propose a Model Predictive Control (MPC)-based Reinforcement Learning (RL) method for Autonomous Surface Vehicles (ASVs). The objective is to find an optimal policy that minimizes the closed-loop performance of a simplified freight mission, including collision-free path following, autonomous docking, and a skillful transition between them. We use a parametrized MPC-scheme to approximate the optimal policy, which considers path-following/docking casts and states (position, velocity)/inputs (thruster force, angle) constraints. The Least Squares Temporal Difference (LSTD)-based Deterministic Policy Gradient (DPG) method is then applied to update the policy parameters. Our simulation results demonstrate that the proposed MPC-LSTD-based DPG method could improve the closed-loop performance during learning for the freight mission problem of ASV.
引用
收藏
页码:2990 / 2995
页数:6
相关论文
共 14 条
[1]  
Cai W., 2021, 2021 60 IEEE C DECIS, P6365
[2]  
Camacho E. F., 2013, Model predictive control, DOI [10.1007/978-0-85729-398-5, DOI 10.1007/978-0-85729-398-5]
[3]  
Gros S., 2020, 200401430 ARXIV
[4]   Data-Driven Economic NMPC Using Reinforcement Learning [J].
Gros, Sebastien ;
Zanon, Mario .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (02) :636-648
[5]   Antidisturbance Coordinated Path Following Control of Robotic Autonomous Surface Vehicles: Theory and Experiment [J].
Gu, Nan ;
Peng, Zhouhua ;
Wang, Dan ;
Shi, Yang ;
Wang, Tianlin .
IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2019, 24 (05) :2386-2396
[6]   Constrained nonlinear control allocation with singularity avoidance using sequential quadratic programming [J].
Johansen, TA ;
Fossen, TI ;
Berge, SP .
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2004, 12 (01) :211-216
[7]  
Kordabad A. B., 2021, ARXIV210402411, P2021
[8]  
Kordabad A. B., 2021, ARXIV210603541, P2021
[9]  
Lagoudakis M. G., 2003, J. Mach. Learn. Res., V4, P1107
[10]   Optimization-Based Automatic Docking and Berthing of ASVs Using Exteroceptive Sensors: Theory and Experiments [J].
Martinsen, Andreas B. ;
Bitar, Glenn ;
Lekkas, Anastasios M. ;
Gros, Sebastien .
IEEE ACCESS, 2020, 8 :204974-204986