BEV-CFKT: A LiDAR-camera cross-modality-interaction fusion and knowledge transfer framework with transformer for BEV 3D object detection

被引:2
作者
Wei, Ming
Li, Jiachen
Kang, Hongyi
Huang, Yijie
Lu, Jun-Guo [1 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
LiDAR-Camera fusion; BEV; Cross-modality interaction; Knowledge transfer; Transformer; 3D detection; Temporal fusion;
D O I
10.1016/j.neucom.2024.127527
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The BEV-CFKT proposed in this paper leverages knowledge transfer through transformers for LiDAR-Camera fusion in the Bird's-Eye-View (BEV) space, aiming to achieve accurate and robust 3D object detection. BEVCFKT comprises three main components, which include the generation of BEV features from images and point clouds, cross -modality interaction, and hybrid object queries using a monocular detection head. By unifying features from both point clouds and images into the BEV space, we simplify modal interaction, facilitate knowledge transfer, and extract richer structural and semantic information from multimodal data. This effectively enhances the network's performance. To further improve detection performance, BEV-CFKT incorporates a temporal fusion module. Additionally, a hybrid object queries module based on a monocular detection head accelerates the convergence of our model. We demonstrate the effectiveness of our approach through an extensive set of experiments.
引用
收藏
页数:11
相关论文
共 53 条
  • [31] Multi-modality 3D object detection in autonomous driving: A review
    Tang, Yingjuan
    He, Hongwen
    Wang, Yong
    Mao, Zan
    Wang, Haoyu
    [J]. NEUROCOMPUTING, 2023, 553
  • [32] Tengteng Huang, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12360), P35, DOI 10.1007/978-3-030-58555-6_3
  • [33] Vaswani A, 2017, ADV NEUR IN, V30
  • [34] UniTR: A Unified and Efficient Multi-Modal Transformer for Bird's-Eye-View Representation
    Wang, Haiyang
    Tang, Hao
    Shi, Shaoshuai
    Li, Aoxue
    Li, Zhenguo
    Schiele, Bernt
    Wang, Liwei
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 6769 - 6779
  • [35] FCOS3D: Fully Convolutional One-Stage Monocular 3D Object Detection
    Wang, Tai
    Zhu, Xinge
    Pang, Jiangmiao
    Lin, Dahua
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2021), 2021, : 913 - 922
  • [36] Wang Y., 2022, C ROBOT LEARNING, P180, DOI DOI 10.48550/ARXIV.2110.06922
  • [37] Object as Query: Lifting any 2D Object Detector to 3D Detection
    Wang, Zitian
    Huang, Zehao
    Fu, Jiahui
    Wang, Naiyan
    Liu, Si
    [J]. 2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3768 - 3777
  • [38] Deep learning-based perception systems for autonomous driving: A comprehensive survey
    Wen, Li-Hua
    Jo, Kang-Hyun
    [J]. NEUROCOMPUTING, 2022, 489 : 255 - 270
  • [39] Xie L, 2020, AAAI CONF ARTIF INTE, V34, P12460
  • [40] SECOND: Sparsely Embedded Convolutional Detection
    Yan, Yan
    Mao, Yuxing
    Li, Bo
    [J]. SENSORS, 2018, 18 (10)