Curriculum Proximal Policy Optimization with Stage-Decaying Clipping for Self-Driving at Unsignalized Intersections

被引：2

作者：

Peng, Zengqi ^{[1
]}

Zhou, Xiao ^{[1
]}

Wang, Yubin ^{[1
]}

Zheng, Lei ^{[1
]}

Liu, Ming ^{[1
,2
,3
]}

Ma, Jun ^{[1
,2
,3
]}

机构：

[1] Hong Kong Univ Sci & Technol Guangzhou, Robot & Autonomous Syst Thrust, Guangzhou, Peoples R China

[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

[3] HKUST Shenzhen Hong Kong Collaborat Innovat Res I, Shenzhen, Peoples R China

来源：

2023 IEEE 26TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS, ITSC | 2023年

关键词：

VEHICLES;

D O I：

10.1109/ITSC57777.2023.10422594

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unsignalized intersections are typically considered as one of the most representative and challenging scenarios for self-driving vehicles. To tackle autonomous driving problems in such scenarios, this paper proposes a curriculum proximal policy optimization (CPPO) framework with stage-decaying clipping. By adjusting the clipping parameter during different stages of training through proximal policy optimization (PPO), the vehicle can first rapidly search for an approximate optimal policy or its neighborhood with a large parameter, and then converges to the optimal policy with a small one. Particularly, the stage-based curriculum learning technology is incorporated into the proposed framework to improve the generalization performance and further accelerate the training process. Moreover, the reward function is specially designed in view of different curriculum settings. A series of comparative experiments are conducted in intersection-crossing scenarios with bi-lane carriageways to verify the effectiveness of the proposed CPPO method. The results show that the proposed approach demonstrates better adaptiveness to different dynamic and complex environments, as well as faster training speed over baseline methods.

引用

页码：5027 / 5033

页数：7

共 27 条

[1]

Aksjonov A, 2021, IEEE INT C INTELL TR, P660, DOI [10.1109/ITSC48978.2021.9565085, 10.1109/ITSC48978.2021.95645085]

[2]

Codevilla F, 2018, IEEE INT CONF ROBOT, P4693

[3] Centralized Cooperation for Connected and Automated Vehicles at Intersections by Proximal Policy Optimization [J].

Guan, Yang ;

Ren, Yangang ;

Li, Shengbo Eben ;

Sun, Qi ;

Luo, Laiquan ;

Li, Keqiang .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) :12597-12608

[4]

Isele D, 2018, IEEE INT CONF ROBOT, P2034

[5] State Dropout-Based Curriculum Reinforcement Learning for Self-Driving at Unsignalized Intersections [J].

Khaitan, Shivesh ;

Dolan, John M. .

2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, :12219-12224

[6]

King DB, 2015, ACS SYM SER, V1214, P1, DOI 10.1021/bk-2015-1214.ch001

[7] Deep Reinforcement Learning for Autonomous Driving: A Survey [J].

Kiran, B. Ravi ;

Sobh, Ibrahim ;

Talpaert, Victor ;

Mannion, Patrick ;

Al Sallab, Ahmad A. ;

Yogamani, Senthil ;

Perez, Patrick .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) :4909-4926

[8]

Kneissl M, 2018, 2018 EUROPEAN CONTROL CONFERENCE (ECC), P1282, DOI 10.23919/ECC.2018.8550251

[9]

Leurent E., 2018, An environment for autonomous driving decisionmaking

[10]

Lu GQ, 2014, 2014 IEEE 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT TRANSPORTATION SYSTEMS (ITSC), P115, DOI 10.1109/ITSC.2014.6957676

← 1 2 3 →