Safe-State Enhancement Method for Autonomous Driving via Direct Hierarchical Reinforcement Learning

被引:17
作者
Gu, Ziqing [1 ]
Gao, Lingping [2 ]
Ma, Haitong [3 ]
Li, Shengbo Eben [1 ]
Zheng, Sifa [1 ]
Jing, Wei [2 ]
Chen, Junbo [2 ]
机构
[1] Tsinghua Univ, Sch Vehicle & Mobil, State Key Lab Automot Safety & Energy, Beijing 100084, Peoples R China
[2] Alibaba Grp, Hangzhou 310000, Peoples R China
[3] Harvard John A Paulson Sch Engn & Appl Sci, Cambridge, MA 02138 USA
基金
中国国家自然科学基金;
关键词
Safety; Autonomous vehicles; Reinforcement learning; Vehicle dynamics; Training; Trajectory; Markov processes; Autonomous vehicle; decision making; reinforcement learning (RL); safety enhancement;
D O I
10.1109/TITS.2023.3271642
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Reinforcement learning (RL) has shown excellent performance in the sequential decision-making problem, where safety in the form of state constraints is of great significance in the design and application of RL. Simple constrained end-to-end RL methods might lead to significant failure in a complex system like autonomous vehicles. In contrast, some hierarchical RL (HRL) methods generate driving goals directly, which could be closely combined with motion planning. With safety requirements, some safe-enhanced RL methods add post-processing modules to avoid unsafe goals or achieve expectation-based safety, which accepts the existence of unsafe states and allows some violations of safe constraints. However, ensuring state safety is vital for autonomous vehicles. Therefore, this paper proposes a state-based safety enhancement method for autonomous driving via direct hierarchical reinforcement learning. Finally, we design a constrained reinforcement learner based on the State-based Constrained Markov Decision Process (SCMDP), where a learnable safety module could adjust the constraint strength adaptively. We integrate a dynamic module in the policy training and generate future goals considering safety, temporal-spatial continuity, and dynamic feasibility, which could eliminate dependence on the prior model. Simulations in the typical highway scenes with uncertainties show that the proposed method has better training performance, higher driving safety in interactive scenes, more decision intelligence in traffic congestions, and better economic driving ability on roads with changing slopes.
引用
收藏
页码:9966 / 9983
页数:18
相关论文
共 52 条
  • [1] Autonomous Helicopter Aerobatics through Apprenticeship Learning
    Abbeel, Pieter
    Coates, Adam
    Ng, Andrew Y.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) : 1608 - 1639
  • [2] Achiam J, 2017, PR MACH LEARN RES, V70
  • [3] ALTMAN E, 1999, STOCH MODEL SER, P1
  • [4] Ames AD, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P3420, DOI [10.23919/ECC.2019.8796030, 10.23919/ecc.2019.8796030]
  • [5] Control Barrier Function Based Quadratic Programs for Safety Critical Systems
    Ames, Aaron D.
    Xu, Xiangru
    Grizzle, Jessy W.
    Tabuada, Paulo
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3861 - 3876
  • [6] A decremental approach with the A* algorithm for speeding-up the optimization process in dynamic shortest path problems
    Ardakani, Mostafa K.
    Tavana, Madjid
    [J]. MEASUREMENT, 2015, 60 : 299 - 307
  • [7] Predictive Cruise Control: Utilizing Upcoming Traffic Signal Information for Improving Fuel Economy and Reducing Trip Time
    Asadi, Behrang
    Vahidi, Ardalan
    [J]. IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2011, 19 (03) : 707 - 714
  • [8] Behrisch M., 2011, P 3 INT C ADV SYST S, P63
  • [9] Trajectory Planning for Autonomous Vehicles Using Hierarchical Reinforcement Learning
    Ben Naveed, Kaleb
    Qiao, Zhiqian
    Dolan, John M.
    [J]. 2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2021, : 601 - 606
  • [10] Set invariance in control
    Blanchini, F
    [J]. AUTOMATICA, 1999, 35 (11) : 1747 - 1767