Stabilization Approaches for Reinforcement Learning-Based End-to-End Autonomous Driving

被引:51
作者
Chen, Siyuan [1 ]
Wang, Meiling [1 ]
Song, Wenjie [1 ]
Yang, Yi [1 ]
Li, Yujun [2 ]
Fu, Mengyin [1 ,3 ]
机构
[1] Beijing Inst Technol, State Key Lab Intelligent Control & Decis Complex, Beijing 100081, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Comp Sci, Shanghai 200240, Peoples R China
[3] Nanjing Univ Sci & Technol, Nanjing 210014, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Deep reinforcement learning; autonomous driving; end-to-end; stabilization;
D O I
10.1109/TVT.2020.2979493
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Deep reinforcement learning (DRL) has been successfully applied to end-to-end autonomous driving, especially in simulation environments. However, common DRL approaches used in complex autonomous driving scenarios sometimes are unstable or difficult to converge. This paper proposes two approaches to improve the stability of the policy model training with as few manual data as possible. For the first approach, reinforcement learning is combined with imitation learning to train a feature network with a small amount of manual data for parameters initialization. For the second approach, an auxiliary network is added to the reinforcement learning framework, which can leverage the real-time measurement information to deepen the understanding of environment, without any guide of demonstrators. To verify the effectiveness of these two approaches, simulations in image information-based and lidar information-based end-to-end autonomous driving systems are conducted, respectively. These approaches are not only tested in the virtual game world, but also applied in Gazebo, in which we build a 3D world based on the real vehicle model of Ranger XP900 platform, the real 3D obstacle model, and the real motion constraints with inertial characteristics, so as to ensure that the trained end-to-end autonomous driving model is more suitable for the real world. Experimental results show that the performance is increased by over 45% in the virtual game world, and can converge quickly and stably in Gazebo in which previous methods can hardly converge.
引用
收藏
页码:4740 / 4750
页数:11
相关论文
共 44 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
Abbeel P, 2004, P 21 INT C MACH LEAR, P1
[3]  
Bansal M, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV
[4]  
Bojarski Mariusz, 2016, arXiv
[5]   DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving [J].
Chen, Chenyi ;
Seff, Ari ;
Kornhauser, Alain ;
Xiao, Jianxiong .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2722-2730
[6]  
Codevilla F, 2018, IEEE INT CONF ROBOT, P4693
[7]  
Del Millan J. R., 1992, REINFORCEMENT LEARNI, P139
[8]  
Du R., 2017, VEHICLE SIMULATOR
[9]  
Glorot X., 2010, P 13 INT C ART INT S, V13, P249
[10]   Energy management based on reinforcement learning with double deep Q-learning for a hybrid electric tracked vehicle [J].
Han, Xuefeng ;
He, Hongwen ;
Wu, Jingda ;
Peng, Jiankun ;
Li, Yuecheng .
APPLIED ENERGY, 2019, 254