BEV-TP: End-to-End Visual Perception and Trajectory Prediction for Autonomous Driving

被引：2

作者：

Lang, Bo ^{[1
]}

Li, Xin ^{[2
]}

Chuah, Mooi Choo ^{[1
]}

机构：

[1] Lehigh Univ, Comp Sci & Engn, Bethlehem, PA 18015 USA

[2] Qualcomm Technol Inc, Qualcomm AI Res, San Diego, CA 92121 USA

来源：

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS | 2024年 / 25卷 / 11期

基金：

美国国家科学基金会;

关键词：

Three-dimensional displays; Trajectory; Transformers; Feature extraction; Visualization; Object detection; Task analysis; Vision-based; end-to-end perception and prediction; autonomous driving; TRANSFORMER; TRACKING;

D O I：

10.1109/TITS.2024.3433591

中图分类号：

TU [建筑科学];

学科分类号：

0813 ;

摘要：

For autonomous vehicles (AVs), the ability for effective end-to-end perception and future trajectory prediction is critical in planning a safe automatic maneuver. In the current AVs systems, perception and prediction are two separate modules. The prediction module receives only a restricted amount of information from the perception module. Furthermore, perception errors will propagate into the prediction module, ultimately having a negative impact on the accuracy of the prediction results. In this paper, we present a novel framework termed BEV-TP, a visual context-guided center-based transformer network for joint 3D perception and trajectory prediction. BEV-TP exploits visual information from consecutive multi-view images and context information from HD semantic maps, to predict better objects' centers whose locations are then used to query visual features and context features via the attention mechanism. Generated agent queries and map queries facilitate learning of the transformer module for further feature aggregation. Finally, multiple regression heads are used to perform 3D bounding box detection and future velocity prediction. This center-based approach achieves a differentiable, simple, and efficient E2E trajectory prediction framework. Extensive experiments conducted on the nuScenes dataset demonstrate the effectiveness of BEV-TP over traditional pipelines with sequential paradigms.

引用

页码：18537 / 18546

页数：10

共 53 条

[41] Track to Detect and Segment: An Online Multi-Object Tracker [J].

Wu, Jialian ;

Cao, Jiale ;

Song, Liangchen ;

Wang, Yu ;

Yang, Ming ;

Yuan, Junsong .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :12347-12356

[42] MotionNet: Joint Perception and Motion Prediction for Autonomous Driving Based on Bird's Eye View Maps [J].

Wu, Pengxiang ;

Chen, Siheng ;

Metaxas, Dimitris N. .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11382-11392

[43] SECOND: Sparsely Embedded Convolutional Detection [J].

Yan, Yan ;

Mao, Yuxing ;

Li, Bo .

SENSORS, 2018, 18 (10)

[44] Center-based 3D Object Detection and Tracking [J].

Yin, Tianwei ;

Zhou, Xingyi ;

Krahenbuhl, Philipp .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :11779-11788

[45]

Yuan Y., 2021, ARXIV

[46] MOTR: End-to-End Multiple-Object Tracking with Transformer [J].

Zeng, Fangao ;

Dong, Bin ;

Zhang, Yuang ;

Wang, Tiancai ;

Zhang, Xiangyu ;

Wei, Yichen .

COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 :659-675

[47] MUTR3D: A Multi-camera Tracking Framework via 3D-to-2D Queries [J].

Zhang, Tianyuan ;

Chen, Xuanyao ;

Wang, Yue ;

Wang, Yilun ;

Zhao, Hang .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW 2022, 2022, :4536-4545

[48]

Zhao H, 2020, PR MACH LEARN RES, V155, P895

[49]

Zhou X., 2019, arXiv

[50] Tracking Objects as Points [J].

Zhou, Xingyi ;

Koltun, Vladlen ;

Krahenbuhl, Philipp .

COMPUTER VISION - ECCV 2020, PT IV, 2020, 12349 :474-490

← 1 2 3 4 5 6 →