BEV-TP: End-to-End Visual Perception and Trajectory Prediction for Autonomous Driving

被引：2

作者：

Lang, Bo ^{[1
]}

Li, Xin ^{[2
]}

Chuah, Mooi Choo ^{[1
]}

机构：

[1] Lehigh Univ, Comp Sci & Engn, Bethlehem, PA 18015 USA

[2] Qualcomm Technol Inc, Qualcomm AI Res, San Diego, CA 92121 USA

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 11期

基金：

美国国家科学基金会;

关键词：

Three-dimensional displays; Trajectory; Transformers; Feature extraction; Visualization; Object detection; Task analysis; Vision-based; end-to-end perception and prediction; autonomous driving; TRANSFORMER; TRACKING;

D O I：

10.1109/TITS.2024.3433591

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

For autonomous vehicles (AVs), the ability for effective end-to-end perception and future trajectory prediction is critical in planning a safe automatic maneuver. In the current AVs systems, perception and prediction are two separate modules. The prediction module receives only a restricted amount of information from the perception module. Furthermore, perception errors will propagate into the prediction module, ultimately having a negative impact on the accuracy of the prediction results. In this paper, we present a novel framework termed BEV-TP, a visual context-guided center-based transformer network for joint 3D perception and trajectory prediction. BEV-TP exploits visual information from consecutive multi-view images and context information from HD semantic maps, to predict better objects' centers whose locations are then used to query visual features and context features via the attention mechanism. Generated agent queries and map queries facilitate learning of the transformer module for further feature aggregation. Finally, multiple regression heads are used to perform 3D bounding box detection and future velocity prediction. This center-based approach achieves a differentiable, simple, and efficient E2E trajectory prediction framework. Extensive experiments conducted on the nuScenes dataset demonstrate the effectiveness of BEV-TP over traditional pipelines with sequential paradigms.

引用

页码：18537 / 18546

页数：10

共 53 条

[1]

[Anonymous], Robot Learning, DOI [10.48550/arXiv.2110.06922, DOI 10.48550/ARXIV.2110.06922]

[2] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[3] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[4]

Casas S, 2018, P 2 C ROB LEARN CORL, P947

[5]

Chaabane M., 2021, arXiv

[6]

Deo N, 2021, PR MACH LEARN RES, V164, P203

[7] TPNet: Trajectory Proposal Network for Motion Prediction [J].

Fang, Liangji ;

Jiang, Qinhong ;

Shi, Jianping ;

Zhou, Bolei .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :6796-6805

[8]

Gu J., 2022, arXiv

[9]

Gu Junru, 2021, P IEEE CVF INT C COM, P15303

[10] Rules of the Road: Predicting Driving Behavior with a Convolutional Model of Semantic Interactions [J].

Hong, Joey ;

Sapp, Benjamin ;

Philbin, James .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :8446-8454

← 1 2 3 4 5 6 →