Reinforcement learning-based collision avoidance: impact of reward function and knowledge transfer

被引：19

作者：

Liu, Xiongqing ^{[1
]}

Jin, Yan ^{[1
]}

机构：

[1] Univ Southern Calif, Dept Aerosp & Mech Engn, 3650 McClintock Ave,OHE-430, Los Angeles, CA 90089 USA

来源：

AI EDAM-ARTIFICIAL INTELLIGENCE FOR ENGINEERING DESIGN ANALYSIS AND MANUFACTURING | 2020年 / 34卷 / 02期

关键词：

Agent-based systems; autonomous vehicle; collision avoidance; deep reinforcement learning; machine learning; TIME OBSTACLE AVOIDANCE; GAME; GO;

D O I：

10.1017/S0890060420000141

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Collision avoidance for robots and vehicles in unpredictable environments is a challenging task. Various control strategies have been developed for the agent (i.e., robots or vehicles) to sense the environment, assess the situation, and select the optimal actions to avoid collision and accomplish its mission. In our research on autonomous ships, we take a machine learning approach to collision avoidance. The lack of available ship steering data of human ship masters has made it necessary to acquire collision avoidance knowledge through reinforcement learning (RL). Given that the learned neural network tends to be a black box, it is desirable that a method is available which can be used to design an agent's behavior so that the desired knowledge can be captured. Furthermore, RL with complex tasks can be either time consuming or unfeasible. A multi-stage learning method is needed in which agents can learn from simple tasks and then transfer their learned knowledge to closely related but more complex tasks. In this paper, we explore the ways of designing agent behaviors through tuning reward functions and devise a transfer RL method for multi-stage knowledge acquisition. The computer simulation-based agent training results have shown that it is important to understand the roles of each component in a reward function and the various design parameters in transfer RL. The settings of these parameters are all dependent on the complexity of the tasks and the similarities between them.

引用

页码：207 / 222

页数：16

共 57 条

[1]

Alonso-Mora J, 2013, SPRINGER TRAC ADV RO, V83, P203

[2]

[Anonymous], ASME 2018 INT DES EN

[3]

[Anonymous], J SOC NAVAL ARCHITEC

[4]

[Anonymous], 1989, LEARNING DELAYED REW

[5]

[Anonymous], 2012, P ADV NEUR INF PROC

[6]

[Anonymous], ARXIV150906461V3CSLG

[7]

[Anonymous], 2016, IEEE-ASME T MECH, DOI DOI 10.1017/S0007123415000642

[8]

[Anonymous], ARXIV160907845CSMA

[9]

[Anonymous], 2016, ADV NEUR INF PROC SY, DOI [DOI 10.2165/00129785-200404040-00005, DOI 10.1145/3065386]

[10]

[Anonymous], ARXIV150302531V1STAT

← 1 2 3 4 5 6 →