APF-DPPO: An Automatic Driving Policy Learning Method Based on the Artificial Potential Field Method to Optimize the Reward Function

被引：9

作者：

Lin, Junqiang ^{[1
]}

Zhang, Po ^{[1
]}

Li, Chengen ^{[1
]}

Zhou, Yipeng ^{[3
]}

Wang, Hongjun ^{[1
,2
]}

Zou, Xiangjun ^{[1
,2
,4
]}

机构：

[1] South China Agr Univ, Coll Engn, Guangzhou 510642, Peoples R China

[2] Guangdong Lab Lingnan Modern Agr, Guangzhou 510642, Peoples R China

[3] Ningbo Univ, Maritime Acad, Ningbo 315000, Peoples R China

[4] Foshan Zhongke Innovat Res Inst Intelligent Agr &, Foshan 528000, Peoples R China

来源：

MACHINES | 2022年 / 10卷 / 07期

关键词：

deep reinforcement learning; proximal policy optimization; autonomous driving; driving strategy; artificial potential field method; reward function; transfer learning;

D O I：

10.3390/machines10070533

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

To address the difficulty of obtaining the optimal driving strategy under the condition of a complex environment and changeable tasks of vehicle autonomous driving, this paper proposes an end-to-end autonomous driving strategy learning method based on deep reinforcement learning. The ideas of target attraction and obstacle rejection of the artificial potential field method are introduced into the distributed proximal policy optimization algorithm, and the APF-DPPO learning model is established. To solve the range repulsion problem of the artificial potential field method, which affects the optimal driving strategy, this paper proposes a directional penalty function method that combines collision penalty and yaw penalty to convert the range penalty of obstacles into a single directional penalty, and establishes the vehicle motion collision model. Finally, the APF-DPPO learning model is selected to train the driving strategy for the virtual vehicle, and the transfer learning method is selected to verify the comparison experiment. The simulation results show that the completion rate of the virtual vehicle in the obstacle environment that generates penalty feedback is as high as 96.3%, which is 3.8% higher than the completion rate in the environment that does not generate penalty feedback. Under different reward functions, the method in this paper obtains the highest cumulative reward value within 500 s, which improves 69 points compared with the reward function method based on the artificial potential field method, and has higher adaptability and robustness in different environments. The experimental results show that this method can effectively improve the efficiency of autonomous driving strategy learning and control the virtual vehicle for autonomous driving behavior decisions, and provide reliable theoretical and technical support for real vehicles in autonomous driving decision-making.

引用

页数：24

共 45 条

[1] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[2] Combined Energy-oriented Path Following and Collision Avoidance approach for Autonomous Electric Vehicles via Nonlinear Model Predictive Control [J].

Bifulco, Gennaro Nicola ;

Coppola, Angelo ;

Loizou, Savvas G. ;

Petrillo, Alberto ;

Santini, Stefania .

2021 21ST IEEE INTERNATIONAL CONFERENCE ON ENVIRONMENT AND ELECTRICAL ENGINEERING AND 2021 5TH IEEE INDUSTRIAL AND COMMERCIAL POWER SYSTEMS EUROPE (EEEIC/I&CPS EUROPE), 2021,

[3]

Bojarski Mariusz, 2016, arXiv

[4]

Borrelli F., 2005, International Journal of Vehicle Autonomous Systems, V3, P265, DOI 10.1504/IJVAS.2005.008237

[5] A Multi-Objective Particle Swarm Optimization for Trajectory Planning of Fruit Picking Manipulator [J].

Cao, Xiaoman ;

Yan, Hansheng ;

Huang, Zhengyan ;

Ai, Si ;

Xu, Yongjun ;

Fu, Renxuan ;

Zou, Xiangjun .

AGRONOMY-BASEL, 2021, 11 (11)

[6]

Chae H, 2017, IEEE INT C INTELL TR

[7] DeepDriving: Learning Affordance for Direct Perception in Autonomous Driving [J].

Chen, Chenyi ;

Seff, Ari ;

Kornhauser, Alain ;

Xiao, Jianxiong .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2722-2730

[8] Plant Disease Recognition Model Based on Improved YOLOv5 [J].

Chen, Zhaoyi ;

Wu, Ruhui ;

Lin, Yiyan ;

Li, Chuyu ;

Chen, Siyu ;

Yuan, Zhineng ;

Chen, Shiwei ;

Zou, Xiangjun .

AGRONOMY-BASEL, 2022, 12 (02)

[9] Integration of renewable energy sources, energy storage systems, and electrical vehicles with smart power distribution networks [J].

Di Fazio, A. R. ;

Erseghe, T. ;

Ghiani, E. ;

Murroni, M. ;

Siano, P. ;

Silvestro, F. .

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2013, 4 (06) :663-671

[10] A reinforced random forest model for enhanced crop yield prediction by integrating agrarian parameters [J].

Elavarasan, Dhivya ;

Vincent, P. M. Durai Raj .

JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (11) :10009-10022

← 1 2 3 4 5 →