Learning to Drive Like Human Beings: A Method Based on Deep Reinforcement Learning

被引：29

作者：

Tian, Yantao ^{[1
,2
]}

Cao, Xuanhao ^{[1
]}

Huang, Kai ^{[1
]}

Fei, Cong ^{[3
]}

Zheng, Zhu ^{[4
]}

Ji, Xuewu ^{[3
]}

机构：

[1] Jilin Univ, Dept Control Sci & Engn, Changchun 130022, Peoples R China

[2] Jilin Univ, Key Lab Bion Engn, Minist Educ, Changchun 130022, Peoples R China

[3] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

[4] Chongqing Jiaotong Univ, Sch Mechatron & Vehicle Engn, Chongqing 400074, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2022年 / 23卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Autonomous driving; path tracking; imitation learning; reinforcement learning; NEURAL-NETWORKS; VEHICLE CONTROL;

D O I：

10.1109/TITS.2021.3055899

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

In this paper, a new framework for path tracking is proposed through learning to drive like human beings. Firstly, the imitation algorithm (behavior cloning) is adopted to initialize the deep reinforcement learning (DRL) algorithm through learning the professional drivers' experience. Secondly, a continuous, deterministic, model free deep reinforcement learning algorithm is adopted to optimize our DRL model on line through trial and error. By combining behavior cloning and deep reinforcement learning algorithms, the DRL model can learn an effective policy quickly for path tracking using some easy-to-measure vehicle state parameters and environment information as inputs. Actor-Critic structure is adopted in the DRL algorithm. In order to speed up the convergence rate of the DRL model and improve the learning effect, we propose a dual actor networks structure for the two different action outputs (steering wheel angle and vehicle speed), and a chief critic network is built to guide the updating process of dual actor networks at the same time. Based on this dual actor networks structure, we can pick out some more important state information as state inputs for different action outputs. Besides, a kind of reward mechanism is also presented for autonomous driving. Finally, simulation training and experiment test are carried out, and the results confirm that the framework proposed in this paper is more than data efficient than the original algorithm, and the trained DRL model can track the reference path with accuracy and has the generalization ability for different roads.

引用

页码：6357 / 6367

页数：11

共 41 条

[1] Deep Reinforcement Learning A brief survey
Arulkumaran, Kai
Deisenroth, Marc Peter
Brundage, Miles
Bharath, Anil Anthony
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38
[2] Behzadan V., 2018, ARXIV180601368
[3] Bojarski Mariusz, 2016, arXiv
[4] Brown R.G., 2012, Introduction to Random Signals and Applied Kalman Filtering With MATLAB Exercises, V4th ed.
[5] A comprehensive survey of multiagent reinforcement learning
Busoniu, Lucian
Babuska, Robert
De Schutter, Bart
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
[6] DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving
Chen, Chenyi
Seff, Ari
Kornhauser, Alain
Xiao, Jianxiong
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2722 - 2730
[7] Codevilla F, 2018, IEEE INT CONF ROBOT, P4693
[8] Dewey Daniel, 2014, 2014 AAAI SPRING S S
[9] Espeholt L, 2018, PR MACH LEARN RES, V80
[10] Ferdowsi A, 2018, IEEE INT C INTELL TR, P307, DOI 10.1109/ITSC.2018.8569635

← 1 2 3 4 5 →