Trajectory Planning for Autonomous Vehicles Using Hierarchical Reinforcement Learning

被引:29
作者
Ben Naveed, Kaleb [1 ]
Qiao, Zhiqian [2 ]
Dolan, John M. [3 ]
机构
[1] Hong Kong Polytech Univ, Student Elect & Informat Engn, Hong Kong, Peoples R China
[2] Carnegie Mellon Univ, Elect & Comp Engn, Pittsburgh, PA 15213 USA
[3] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
来源
2021 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC) | 2021年
关键词
Trajectory Planning; Hierarchical Deep Reinforcement Learning; Double Deep Q-Learning; PID controller;
D O I
10.1109/ITSC48978.2021.9564634
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Planning safe trajectories under uncertain and dynamic conditions makes the autonomous driving problem significantly complex. Current heuristic-based algorithms such as the slot-based method rely heavily on hand-engineered parameters and are restricted to specific scenarios. Supervised learning methods such as Imitation Learning lack generalization and safety guarantees. To address these problems and to ensure a robust framework, we propose a Robust-Hierarchical Reinforcement Learning (HRL) framework for learning autonomous driving policies. We adapt a state-of-the-art algorithm, Hierarchical Double Deep Q-learning (h-DDQN), and make the framework robust by (1) constituting the decision of selecting driving maneuver as a high-level option; (2) for the lower-level controller, outputting waypoint trajectories to track with a Proportional-Integral-Derivative (PID) controller instead of direct acceleration/steering actions; and (3) using a Long-Short-Term-Memory (LSTM) layer in the network to alleviate the effects of observation noise and dynamic driving behaviors. Moreover, to improve the sample efficiency, we use Hybrid Reward Mechanism and Reward-Driven Exploration. Results from the high-fidelity CARLA simulator while simulating different interactive lane change scenarios indicate that the proposed framework reduces convergence time, generates smoother trajectories, and can better handle dynamic surroundings and noisy observations as compared to other traditional RL approaches.
引用
收藏
页码:601 / 606
页数:6
相关论文
共 19 条
  • [1] Traffic Interaction in the Urban Challenge: Putting Boss on its Best Behavior
    Baker, Christopher R.
    Dolan, John M.
    [J]. 2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 1752 - 1758
  • [2] Cai PD, 2019, IEEE INT C INTELL TR, P2736, DOI [10.1109/itsc.2019.8917149, 10.1109/ITSC.2019.8917149]
  • [3] Chen JY, 2019, IEEE INT C INT ROBOT, P2884, DOI [10.1109/iros40897.2019.8968225, 10.1109/IROS40897.2019.8968225]
  • [4] Dietterich T. G, 1998, ICML, V98, P1
  • [5] Dosovitskiy A., 2017, arXiv
  • [6] Multipolicy decision-making for autonomous driving via changepoint-based behavior prediction: Theory and experiment
    Galceran, Enric
    Cunningham, Alexander G.
    Eustice, Ryan M.
    Olson, Edwin
    [J]. AUTONOMOUS ROBOTS, 2017, 41 (06) : 1367 - 1382
  • [7] Hausknecht M., 2015, Deep recurrent q-learning for partially observable mdps, DOI 10.48550/arXiv.1507.06527
  • [8] Jong Nicholas K., 2008, P 25 INT C MACHINE L
  • [9] Sampling-based algorithms for optimal motion planning
    Karaman, Sertac
    Frazzoli, Emilio
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2011, 30 (07) : 846 - 894
  • [10] Kulkarni TD, 2016, ADV NEUR IN, V29