BEV-CFKT: A LiDAR-camera cross-modality-interaction fusion and knowledge transfer framework with transformer for BEV 3D object detection

被引：4

作者：

Wei, Ming

Li, Jiachen

Kang, Hongyi

Huang, Yijie

Lu, Jun-Guo ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China

来源：

NEUROCOMPUTING | 2024年 / 582卷

基金：

中国国家自然科学基金;

关键词：

LiDAR-Camera fusion; BEV; Cross-modality interaction; Knowledge transfer; Transformer; 3D detection; Temporal fusion;

D O I：

10.1016/j.neucom.2024.127527

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The BEV-CFKT proposed in this paper leverages knowledge transfer through transformers for LiDAR-Camera fusion in the Bird's-Eye-View (BEV) space, aiming to achieve accurate and robust 3D object detection. BEVCFKT comprises three main components, which include the generation of BEV features from images and point clouds, cross -modality interaction, and hybrid object queries using a monocular detection head. By unifying features from both point clouds and images into the BEV space, we simplify modal interaction, facilitate knowledge transfer, and extract richer structural and semantic information from multimodal data. This effectively enhances the network's performance. To further improve detection performance, BEV-CFKT incorporates a temporal fusion module. Additionally, a hybrid object queries module based on a monocular detection head accelerates the convergence of our model. We demonstrate the effectiveness of our approach through an extensive set of experiments.

引用

页数：11

共 53 条

[1] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].

Bai, Xuyang ;

Hu, Zeyu ;

Zhu, Xinge ;

Huang, Qingqiu ;

Chen, Yilun ;

Fu, Hangbo ;

Tai, Chiew-Lan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089

[2] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[3]

Cai HX, 2023, Arxiv, DOI arXiv:2303.17099

[4] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[5] MSL3D: 3D object detection from monocular, stereo and point cloud for autonomous driving [J].

Chen, Wenyu ;

Li, Peixuan ;

Zhao, Huaici .

NEUROCOMPUTING, 2022, 494 :23-32

[6] FUTR3D: A Unified Sensor Fusion Framework for 3D Detection [J].

Chen, Xuanyao ;

Zhang, Tianyuan ;

Wang, Yue ;

Wang, Yilun ;

Zhao, Hang .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, :172-181

[7]

Chen ZH, 2022, Arxiv, DOI [arXiv:2211.09386, DOI 10.48550/ARXIV.2211.09386]

[8]

Contributors M., 2020, MMDetection3D. OpenMMLab NextGeneration Platform for General 3D Object Detection

[9] Image guidance based 3D vehicle detection in traffic scene [J].

Dai, Deyun ;

Wang, Jikai ;

Chen, Zonghai ;

Zhao, Hao .

NEUROCOMPUTING, 2021, 428 :1-11

[10] Dynamic DETR: End-to-End Object Detection with Dynamic Attention [J].

Dai, Xiyang ;

Chen, Yinpeng ;

Yang, Jianwei ;

Zhang, Pengchuan ;

Yuan, Lu ;

Zhang, Lei .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2968-2977

← 1 2 3 4 5 6 →