A Novel Reinforcement Learning Collision Avoidance Algorithm for USVs Based on Maneuvering Characteristics and COLREGs

被引：19

作者：

Fan, Yunsheng ^{[1
,2
]}

Sun, Zhe ^{[1
,2
]}

Wang, Guofeng ^{[1
,2
]}

机构：

[1] Dalian Maritime Univ, Coll Marine Elect Engn, Dalian 116026, Peoples R China

[2] Key Lab Technol & Syst Intelligent Ships Liaoning, Dalian 116026, Peoples R China

来源：

SENSORS | 2022年 / 22卷 / 06期

基金：

中国国家自然科学基金;

关键词：

unmanned surface vehicle; deep reinforcement learning; autonomous collision avoidance; COLREGs; LEVEL;

D O I：

10.3390/s22062099

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Autonomous collision avoidance technology provides an intelligent method for unmanned surface vehicles' (USVs) safe and efficient navigation. In this paper, the USV collision avoidance problem under the constraint of the international regulations for preventing collisions at sea (COLREGs) was studied. Here, a reinforcement learning collision avoidance (RLCA) algorithm is proposed that complies with USV maneuverability. Notably, the reinforcement learning agent does not require any prior knowledge about USV collision avoidance from humans to learn collision avoidance motions well. The double-DQN method was used to reduce the overestimation of the action-value function. A dueling network architecture was adopted to clearly distinguish the difference between a great state and an excellent action. Aiming at the problem of agent exploration, a method based on the characteristics of USV collision avoidance, the category-based exploration method, can improve the exploration ability of the USV. Because a large number of turning behaviors in the early steps may affect the training, a method to discard some of the transitions was designed, which can improve the effectiveness of the algorithm. A finite Markov decision process (MDP) that conforms to the USVs' maneuverability and COLREGs was used for the agent training. The RLCA algorithm was tested in a marine simulation environment in many different USV encounters, which showed a higher average reward. The RLCA algorithm bridged the divide between USV navigation status information and collision avoidance behavior, resulting in successfully planning a safe and economical path to the terminal.

引用

页数：29

共 45 条

[1] Deep-Sarsa: A reinforcement learning algorithm for autonomous navigation [J].

Andrecut, M ;

Ali, MK .

INTERNATIONAL JOURNAL OF MODERN PHYSICS C, 2001, 12 (10) :1513-1523

[2]

Bellemare MG, 2016, ADV NEUR IN, V29

[3] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[4] Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels [J].

Cheng, Yin ;

Zhang, Weidong .

NEUROCOMPUTING, 2018, 272 :63-73

[5] Deep reinforcement learning-based collision avoidance for an autonomous ship [J].

Chun, Do-Hyun ;

Roh, Myung-Il ;

Lee, Hye-Won ;

Ha, Jisang ;

Yu, Donghun .

OCEAN ENGINEERING, 2021, 234

[6] First return, then explore [J].

Ecoffet, Adrien ;

Huizinga, Joost ;

Lehman, Joel ;

Stanley, Kenneth O. ;

Clune, Jeff .

NATURE, 2021, 590 (7847) :580-586

[7]

Fossen T., 2021, Handbook of Marine Craft Hydrodynamics and Motion Control, V2nd

[8] An Autonomous Path Planning Model for Unmanned Ships Based on Deep Reinforcement Learning [J].

Guo, Siyu ;

Zhang, Xiuguo ;

Zheng, Yisong ;

Du, Yiquan .

SENSORS, 2020, 20 (02)

[9]

Hasselt H. V., 2010, Advances in Neural Information Processing Systems, V23, P2613

[10] A path planning strategy unified with a COLREGS collision avoidance function based on deep reinforcement learning and artificial potential field [J].

Li, Lingyu ;

Wu, Defeng ;

Huang, Youqiang ;

Yuan, Zhi-Ming .

APPLIED OCEAN RESEARCH, 2021, 113

← 1 2 3 4 5 →