Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data

被引：146

作者：

Duan, Jingliang ^{[1
]}

Eben Li, Shengbo ^{[1
]}

Guan, Yang ^{[1
]}

Sun, Qi ^{[1
]}

Cheng, Bo ^{[1
]}

机构：

[1] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

来源：

IET INTELLIGENT TRANSPORT SYSTEMS | 2020年 / 14卷 / 05期

关键词：

neural nets; decision making; learning (artificial intelligence); motion control; road traffic control; control engineering computing; driver information systems; parallel processing; self-driving cars; labelled driving data; high-level manoeuvre selection; low-level motion control; asynchronous parallel reinforcement learners; driving decisions; highway driving scenario; decision-making; supervised learning; hierarchical reinforcement learning; driving in lane; right lane change; left lane change; fully-connected neural networks; AUTONOMOUS VEHICLES; ROAD; GAME; GO;

D O I：

10.1049/iet-its.2019.0317

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Decision making for self-driving cars is usually tackled by manually encoding rules from drivers' behaviours or imitating drivers' manipulation using supervised learning techniques. Both of them rely on mass driving data to cover all possible driving scenarios. This study presents a hierarchical reinforcement learning method for decision making of self-driving cars, which does not depend on a large amount of labelled driving data. This method comprehensively considers both high-level manoeuvre selection and low-level motion control in both lateral and longitudinal directions. The authors firstly decompose the driving tasks into three manoeuvres, including driving in lane, right lane change and left lane change, and learn the sub-policy for each manoeuvre. Then, a master policy is learned to choose the manoeuvre policy to be executed in the current state. All policies, including master policy and manoeuvre policies, are represented by fully-connected neural networks and trained by using asynchronous parallel reinforcement learners, which builds a mapping from the sensory outputs to driving decisions. Different state spaces and reward functions are designed for each manoeuvre. They apply this method to a highway driving scenario, which demonstrates that it can realise smooth and safe decision making for self-driving cars.

引用

页码：297 / 305

页数：9

共 39 条

[1]

[Anonymous], 2016, PROC INT C LEARNING

[2]

[Anonymous], 2018, PROC AAAI C ARTIF IN

[3]

[Anonymous], 27 INT C AUT PLANN S

[4]

[Anonymous], 2017, ARXIV

[5]

[Anonymous], OPHTHALMIC SURG LASE, DOI DOI 10.3928/15428877-20120726-02

[6]

[Anonymous], 2018, REINFORCEMENT LEARNI

[7]

Attia R, 2012, P AMER CONTR CONF, P6509

[8] Natural actor-critic algorithms [J].

Bhatnagar, Shalabh ;

Sutton, Richard S. ;

Ghavamzadeh, Mohammad ;

Lee, Mark .

AUTOMATICA, 2009, 45 (11) :2471-2482

[9] A geometry-driven car-following distance estimation algorithm robust to road slopes [J].

Cao, Zhong ;

Yang, Diange ;

Jiang, Kun ;

Xu, Shaobing ;

Wang, Sijia ;

Zhu, Minghan ;

Xiao, Zhongyang .

TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2019, 102 :274-288

[10] DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving [J].

Chen, Chenyi ;

Seff, Ari ;

Kornhauser, Alain ;

Xiao, Jianxiong .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2722-2730

← 1 2 3 4 →