MPC-based Reinforcement Learning for a Simplified Freight Mission of Autonomous Surface Vehicles

被引：20

作者：

Cai, Wenqi ^{[1
]}

Kordabad, Arash B. ^{[1
]}

Esfahani, Hossein N. ^{[1
]}

Lekkas, Anastasios M. ^{[1
]}

Gros, Sebastien ^{[1
]}

机构：

[1] Norwegian Univ Sci & Technol NTNU, Dept Engn Cybernet, Trondheim, Norway

来源：

2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2021年

关键词：

D O I：

10.1109/CDC45484.2021.9683750

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we propose a Model Predictive Control (MPC)-based Reinforcement Learning (RL) method for Autonomous Surface Vehicles (ASVs). The objective is to find an optimal policy that minimizes the closed-loop performance of a simplified freight mission, including collision-free path following, autonomous docking, and a skillful transition between them. We use a parametrized MPC-scheme to approximate the optimal policy, which considers path-following/docking casts and states (position, velocity)/inputs (thruster force, angle) constraints. The Least Squares Temporal Difference (LSTD)-based Deterministic Policy Gradient (DPG) method is then applied to update the policy parameters. Our simulation results demonstrate that the proposed MPC-LSTD-based DPG method could improve the closed-loop performance during learning for the freight mission problem of ASV.

引用

页码：2990 / 2995

页数：6

共 14 条

[1]

Cai W., 2021, 2021 60 IEEE C DECIS, P6365

[2]

Camacho E. F., 2013, Model predictive control, DOI [10.1007/978-0-85729-398-5, DOI 10.1007/978-0-85729-398-5]

[3]

Gros S., 2020, 200401430 ARXIV

[4] Data-Driven Economic NMPC Using Reinforcement Learning [J].

Gros, Sebastien ;

Zanon, Mario .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (02) :636-648

[5] Antidisturbance Coordinated Path Following Control of Robotic Autonomous Surface Vehicles: Theory and Experiment [J].

Gu, Nan ;

Peng, Zhouhua ;

Wang, Dan ;

Shi, Yang ;

Wang, Tianlin .

IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2019, 24 (05) :2386-2396

[6] Constrained nonlinear control allocation with singularity avoidance using sequential quadratic programming [J].

Johansen, TA ;

Fossen, TI ;

Berge, SP .

IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2004, 12 (01) :211-216

[7]

Kordabad A. B., 2021, ARXIV210402411, P2021

[8]

Kordabad A. B., 2021, ARXIV210603541, P2021

[9]

Lagoudakis M. G., 2003, J. Mach. Learn. Res., V4, P1107

[10] Optimization-Based Automatic Docking and Berthing of ASVs Using Exteroceptive Sensors: Theory and Experiments [J].

Martinsen, Andreas B. ;

Bitar, Glenn ;

Lekkas, Anastasios M. ;

Gros, Sebastien .

IEEE ACCESS, 2020, 8 :204974-204986

← 1 2 →