Reinforcement Learning with Uncertainty Estimation for Tactical Decision-Making in Intersections

被引：20

作者：

Hoel, Carl-Johan ^{[1
,3
,4
]}

Tram, Tommy ^{[2
,3
,4
]}

Sjoberg, Jonas ^{[3
]}

机构：

[1] Volvo Grp, Gothenburg, Sweden

[2] Zenu AB, Gothenburg, Sweden

[3] Chalmers Univ Technol, Gothenburg, Sweden

[4] AI Innovat Sweden, Gothenburg, Sweden

来源：

2020 IEEE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC) | 2020年

关键词：

D O I：

10.1109/itsc45102.2020.9294407

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This paper investigates how a Bayesian reinforcement learning method can be used to create a tactical decision-making agent for autonomous driving in an intersection scenario, where the agent can estimate the confidence of its decisions. An ensemble of neural networks, with additional randomized prior functions (RPF), are trained by using a boot-strapped experience replay memory. The coefficient of variation in the estimated Q-values of the ensemble members is used to approximate the uncertainty, and a criterion that determines if the agent is sufficiently confident to make a particular decision is introduced. The performance of the ensemble RPF method is evaluated in an intersection scenario and compared to a standard Deep Q-Network method, which does not estimate the uncertainty. It is shown that the trained ensemble RPF agent can detect cases with high uncertainty, both in situations that are far from the training distribution, and in situations that seldom occur within the training distribution. This work demonstrates one possible application of such a confidence estimate, by using this information to choose safe actions in unknown situations, which removes all collisions from within the training distribution, and most collisions outside of the distribution.

引用

页数：7

共 29 条

[1]

[Anonymous], 2014, 812261 DOT HS

[2] SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].

Badrinarayanan, Vijay ;

Kendall, Alex ;

Cipolla, Roberto .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495

[3]

Bansal M, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV

[4]

Dearden R, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P761

[5]

Efron B., 1982, SOC IND APPL MATH

[6]

García J, 2015, J MACH LEARN RES, V16, P1437

[7]

Hoel CJ, 2020, IEEE INT VEH SYM, P1563, DOI [10.1109/IV47402.2020.9304614, 10.1109/iv47402.2020.9304614]

[8]

Hoel CJ, 2018, IEEE INT C INTELL TR, P2148, DOI 10.1109/ITSC.2018.8569568

[9]

Isele D, 2018, IEEE INT CONF ROBOT, P2034

[10] Planning and acting in partially observable stochastic domains [J].

Kaelbling, LP ;

Littman, ML ;

Cassandra, AR .

ARTIFICIAL INTELLIGENCE, 1998, 101 (1-2) :99-134

← 1 2 3 →