TransFuser: Imitation With Transformer-Based Sensor Fusion for Autonomous Driving

被引：105

作者：

Chitta, Kashyap ^{[1
,2
]}

Prakash, Aditya ^{[3
]}

Jaeger, Bernhard ^{[1
,2
]}

Yu, Zehao ^{[1
,2
]}

Renz, Katrin ^{[1
,2
]}

Geiger, Andreas ^{[1
,2
]}

机构：

[1] Univ Tubingen, Autonomous Vis Grp, D-72074 Tubingen, Germany

[2] Max Planck Inst Intelligent Syst, D-72076 Tubingen, Germany

[3] Univ Illinois, Champaign, IL 61820 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2023年 / 45卷 / 11期

关键词：

Laser radar; Transformers; Three-dimensional displays; Semantics; Sensor fusion; Cameras; Autonomous vehicles; Attention; autonomous driving; imitation learning; sensor fusion; transformers;

D O I：

10.1109/TPAMI.2022.3200245

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

How should we integrate representations from complementary sensors for autonomous driving? Geometry-based fusion has shown promise for perception (e.g., object detection, motion forecasting). However, in the context of end-to-end driving, we find that imitation learning based on existing sensor fusion methods underperforms in complex driving scenarios with a high density of dynamic agents. Therefore, we propose TransFuser, a mechanism to integrate image and LiDAR representations using self-attention. Our approach uses transformer modules at multiple resolutions to fuse perspective view and bird's eye view feature maps. We experimentally validate its efficacy on a challenging new benchmark with long routes and dense traffic, as well as the official leaderboard of the CARLA urban driving simulator. At the time of submission, TransFuser outperforms all prior work on the CARLA leaderboard in terms of driving score by a large margin. Compared to geometry-based fusion, TransFuser reduces the average collisions per kilometer by 48%.

引用

页码：12878 / 12895

页数：18

共 129 条

[1]

Agarwal T, 2021, Arxiv, DOI arXiv:2101.05970

[2]

[Anonymous], 2015, Adaptive Control Processes-A Guided Tour

[3]

[Anonymous], 2020, C ROBOT LEARNING, DOI DOI 10.48550/ARXIV.1912.12294

[4] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].

Bai, Xuyang ;

Hu, Zeyu ;

Zhu, Xinge ;

Huang, Qingqiu ;

Chen, Yilun ;

Fu, Hangbo ;

Tai, Chiew-Lan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089

[5]

Bansal M, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV

[6] Label Efficient Visual Abstractions for Autonomous Driving [J].

Behl, Aseem ;

Chitta, Kashyap ;

Prakash, Aditya ;

Ohn-Bar, Eshed ;

Geiger, Andreas .

2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, :2338-2345

[7]

Cao Z., 2022, arXiv

[8]

Casas S, 2018, PR MACH LEARN RES, V87

[9]

Casas S, 2020, IEEE INT CONF ROBOT, P9491, DOI [10.1109/icra40945.2020.9196697, 10.1109/ICRA40945.2020.9196697]

[10]

Chekroun R., 2021, arXiv

← 1 2 3 4 5 6 7 8 9 10 →