Safe-State Enhancement Method for Autonomous Driving via Direct Hierarchical Reinforcement Learning

被引：17

作者：

Gu, Ziqing ^{[1
]}

Gao, Lingping ^{[2
]}

Ma, Haitong ^{[3
]}

Li, Shengbo Eben ^{[1
]}

Zheng, Sifa ^{[1
]}

Jing, Wei ^{[2
]}

Chen, Junbo ^{[2
]}

机构：

[1] Tsinghua Univ, Sch Vehicle & Mobil, State Key Lab Automot Safety & Energy, Beijing 100084, Peoples R China

[2] Alibaba Grp, Hangzhou 310000, Peoples R China

[3] Harvard John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2023年 / 24卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Safety; Autonomous vehicles; Reinforcement learning; Vehicle dynamics; Training; Trajectory; Markov processes; Autonomous vehicle; decision making; reinforcement learning (RL); safety enhancement;

D O I：

10.1109/TITS.2023.3271642

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

Reinforcement learning (RL) has shown excellent performance in the sequential decision-making problem, where safety in the form of state constraints is of great significance in the design and application of RL. Simple constrained end-to-end RL methods might lead to significant failure in a complex system like autonomous vehicles. In contrast, some hierarchical RL (HRL) methods generate driving goals directly, which could be closely combined with motion planning. With safety requirements, some safe-enhanced RL methods add post-processing modules to avoid unsafe goals or achieve expectation-based safety, which accepts the existence of unsafe states and allows some violations of safe constraints. However, ensuring state safety is vital for autonomous vehicles. Therefore, this paper proposes a state-based safety enhancement method for autonomous driving via direct hierarchical reinforcement learning. Finally, we design a constrained reinforcement learner based on the State-based Constrained Markov Decision Process (SCMDP), where a learnable safety module could adjust the constraint strength adaptively. We integrate a dynamic module in the policy training and generate future goals considering safety, temporal-spatial continuity, and dynamic feasibility, which could eliminate dependence on the prior model. Simulations in the typical highway scenes with uncertainties show that the proposed method has better training performance, higher driving safety in interactive scenes, more decision intelligence in traffic congestions, and better economic driving ability on roads with changing slopes.

引用

页码：9966 / 9983

页数：18

共 52 条

[1] Autonomous Helicopter Aerobatics through Apprenticeship Learning
Abbeel, Pieter
Coates, Adam
Ng, Andrew Y.
[J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) : 1608 - 1639
[2] Achiam J, 2017, PR MACH LEARN RES, V70
[3] ALTMAN E, 1999, STOCH MODEL SER, P1
[4] Ames AD, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P3420, DOI [10.23919/ECC.2019.8796030, 10.23919/ecc.2019.8796030]
[5] Control Barrier Function Based Quadratic Programs for Safety Critical Systems
Ames, Aaron D.
Xu, Xiangru
Grizzle, Jessy W.
Tabuada, Paulo
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3861 - 3876
[6] A decremental approach with the A* algorithm for speeding-up the optimization process in dynamic shortest path problems
Ardakani, Mostafa K.
Tavana, Madjid
[J]. MEASUREMENT, 2015, 60 : 299 - 307
[7] Predictive Cruise Control: Utilizing Upcoming Traffic Signal Information for Improving Fuel Economy and Reducing Trip Time
Asadi, Behrang
Vahidi, Ardalan
[J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2011, 19 (03) : 707 - 714
[8] Behrisch M., 2011, P 3 INT C ADV SYST S, P63
[9] Trajectory Planning for Autonomous Vehicles Using Hierarchical Reinforcement Learning
Ben Naveed, Kaleb
Qiao, Zhiqian
Dolan, John M.
[J]. 2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 601 - 606
[10] Set invariance in control
Blanchini, F
[J]. AUTOMATICA, 1999, 35 (11) : 1747 - 1767

← 1 2 3 4 5 6 →