Rethinking the Late Fusion of LiDAR-Camera Based 3D Object Detection

被引:0
|
作者
Yu, Lehang [1 ,2 ]
Zhang, Jing [1 ,2 ]
Liu, Zhong [1 ,2 ]
Yue, Haosong [1 ,2 ]
Chen, Weihai [1 ,2 ]
机构
[1] Beihang Univ, Sch Automat Sci & Elect Engn, Beijing, Peoples R China
[2] Beihang Univ, Hangzhou Innovat Inst, Hangzhou, Peoples R China
来源
2024 IEEE 19TH CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS, ICIEA 2024 | 2024年
基金
北京市自然科学基金; 中国国家自然科学基金;
关键词
3D object detection; late fusion; semantic segmentation; VOXELNET;
D O I
10.1109/ICIEA61579.2024.10665299
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
3D object detection plays an important role in autonomous driving. Different from the early stage methods that only use single-modality data, multi-modality based, mostly LiDAR-camera based detectors, have been widely studied and proposed in recent years. All fusion methods can be divided into three categories, i.e. early fusion that fuses raw data from different modalities, middle fusion that intermediately fuses the extracted multi-modal features by feature alignment, and late fusion that conducts fusion on instance level after detectors of different modality individually give their predictions. Current methods mainly focus on early and middle fusion, neglecting the huge potential of late fusion. This paper reveals the feasibility of improving the performance of 3D object detector by applying our proposed semantic consistency filter (SCF), a plug-and-play late fusion strategy, to any existing models. SCF works by removing incorrect prediction boxes by computing semantic inconsistency rate (SIR) with their corresponding 2D semantic mask generated by semantic segmentation network. Abundant experiments conducted with several different baselines prove the effectiveness and versatility of SCF, indicating that late fusion might be the key to improve the performance of 3D object detectors.
引用
收藏
页数:6
相关论文
共 50 条
  • [1] SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
    Zhang, Hongcheng
    Liang, Liu
    Zeng, Pengxin
    Song, Xiao
    Wang, Zhe
    COMPUTER VISION-ECCV 2024, PT XXXV, 2025, 15093 : 109 - 128
  • [2] MSGFusion: Muti-scale Semantic Guided LiDAR-Camera Fusion for 3D Object Detection
    Zhu, Huming
    Xue, Yiyu
    Cheng, Xinyue
    Hou, Biao
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [3] FAFNs: Frequency-Aware LiDAR-Camera Fusion Networks for 3-D Object Detection
    Wang, Jingxuan
    Lu, Yuanyao
    Jiang, Haiyang
    IEEE SENSORS JOURNAL, 2023, 23 (24) : 30847 - 30857
  • [4] CoreNet: Conflict Resolution Network for point-pixel misalignment and sub-task suppression of 3D LiDAR-camera object detection
    Li, Yiheng
    Yang, Yang
    Lei, Zhen
    INFORMATION FUSION, 2025, 118
  • [5] GAFusion: Adaptive Fusing LiDAR and Camera with Multiple Guidance for 3D Object Detection
    Li, Xiaotian
    Fan, Baojie
    Tian, Jiandong
    Fan, Huijie
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 21209 - 21218
  • [6] Building and optimization of 3D semantic map based on Lidar and camera fusion
    Li, Jing
    Zhang, Xin
    Li, Jiehao
    Liu, Yanyu
    Wang, Junzheng
    NEUROCOMPUTING, 2020, 409 : 394 - 407
  • [7] LiDAR-Camera Continuous Fusion in Voxelized Grid for Semantic Scene Completion
    Lu, Zonghao
    Cao, Bing
    Hu, Qinghua
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12330 - 12344
  • [8] Efficient Traversability Mapping Based on Single Camera and 3D LiDAR
    Youn, Chanmin
    Youn, Wonkeun
    Kim, Sanghyun
    Park, Jinseong
    Shin, Young-Sik
    INTELLIGENT AUTONOMOUS SYSTEMS 18, VOL 1, IAS18-2023, 2024, 795 : 607 - 615
  • [9] Surface Target Detection Algorithm Based on 3D Lidar
    Zhou Zhiguo
    Li Yiyao
    Cao Hangwei
    Di Shunfan
    LASER & OPTOELECTRONICS PROGRESS, 2022, 59 (18)
  • [10] Fusion of 3D-LIDAR and camera data for scene parsing
    Zhao, Gangqiang
    Xiao, Xuhong
    Yuan, Junsong
    Ng, Gee Wah
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2014, 25 (01) : 165 - 183