SuperFusion: Multilevel LiDAR-Camera Fusion for Long-Range HD Map Generation

被引:6
作者
Dong, Hao [1 ]
Gu, Weihao [2 ]
Zhang, Xianjing [2 ]
Xu, Jintao [2 ]
Ai, Rui [2 ]
Lu, Huimin [3 ]
Kannala, Juho [4 ]
Chen, Xieyuanli [3 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
[2] HAOMO AI, Beijing, Peoples R China
[3] Natl Univ Def Technol, Changsha, Peoples R China
[4] Aalto Univ, Espoo, Finland
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024) | 2024年
关键词
D O I
10.1109/ICRA57147.2024.10611320
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
High-definition (HD) semantic map generation of the environment is an essential component of autonomous driving. Existing methods have achieved good performance in this task by fusing different sensor modalities, such as LiDAR and camera. However, current works are based on raw data or network feature-level fusion and only consider short-range HD map generation, limiting their deployment to realistic autonomous driving applications. In this paper, we focus on the task of building the HD maps in both short ranges, i.e., within 30 m, and also predicting long-range HD maps up to 90 m, which is required by downstream path planning and control tasks to improve the smoothness and safety of autonomous driving. To this end, we propose a novel network named SuperFusion, exploiting the fusion of LiDAR and camera data at multiple levels. We use LiDAR depth to improve image depth estimation and use image features to guide long-range LiDAR feature prediction. We benchmark our SuperFusion on the nuScenes dataset and a self-recorded dataset and show that it outperforms the state-of-the-art baseline methods with large margins on all intervals. Additionally, we apply the generated HD map to a downstream path planning task, demonstrating that the long-range HD maps predicted by our method can lead to better path planning for autonomous vehicles. Our code and self-recorded dataset have been released at https://github.com/haomo-ai/SuperFusion.
引用
收藏
页码:9056 / 9062
页数:7
相关论文
共 34 条
[1]   TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].
Bai, Xuyang ;
Hu, Zeyu ;
Zhu, Xinge ;
Huang, Qingqiu ;
Chen, Yilun ;
Fu, Hangbo ;
Tai, Chiew-Lan .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089
[2]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[3]  
Chen L.-C., 2017, P IEEE C COMP VIS PA
[4]   PersFormer: 3D Lane Detection via Perspective Transformer and the OpenLane Benchmark [J].
Chen, Li ;
Sima, Chonghao ;
Li, Yang ;
Zheng, Zehan ;
Xu, Jiajie ;
Geng, Xiangwei ;
Li, Hongyang ;
He, Conghui ;
Shi, Jianping ;
Qiao, Yu ;
Yan, Junchi .
COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 :550-567
[5]   Semantic Instance Segmentation for Autonomous Driving [J].
De Brabandere, Bert ;
Neven, Davy ;
Van Gool, Luc .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :478-480
[6]  
Ester M, 1996, KDD 96, P226, DOI DOI 10.5555/3001460.3001507
[7]   The dynamic window approach to collision avoidance [J].
Fox, D ;
Burgard, W ;
Thrun, S .
IEEE ROBOTICS & AUTOMATION MAGAZINE, 1997, 4 (01) :23-33
[8]   3D-LaneNet: End-to-End 3D Multiple Lane Detection [J].
Garnett, Noa ;
Cohen, Rafi ;
Pe'er, Tomer ;
Lahav, Roee ;
Levi, Dan .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2921-2930
[9]  
Ghallabi F, 2018, IEEE INT C INTELL TR, P2209, DOI 10.1109/ITSC.2018.8569951
[10]  
Guo Y., 2020, P EUR C COMP VIS ECC