Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network

被引:1
作者
Tang, Yingjuan [1 ]
He, Hongwen [1 ]
Wang, Yong [1 ]
Wu, Jingda [2 ]
机构
[1] Beijing Inst Technol, Sch Mech Engn, Beijing 100081, Peoples R China
[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, 50 Nanyang Ave, Singapore 639798, Singapore
关键词
Autonomous driving; 3D object detection; Multi-modal; Sparse convolutional networks; Point cloud and image fusion; Homogeneous fusion;
D O I
10.1016/j.eswa.2024.124945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi- modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.
引用
收藏
页数:12
相关论文
共 52 条
  • [11] Ku J, 2018, IEEE INT C INT ROBOT, P5750, DOI 10.1109/IROS.2018.8594049
  • [12] PointPillars: Fast Encoders for Object Detection from Point Clouds
    Lang, Alex H.
    Vora, Sourabh
    Caesar, Holger
    Zhou, Lubing
    Yang, Jiong
    Beijbom, Oscar
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697
  • [13] Enhancing Multi-modal Features Using Local Self-attention for 3D Object Detection
    Li, Hao
    Zhang, Zehan
    Zhao, Xian
    Wang, Yulong
    Shen, Yuxi
    Pu, Shiliang
    Mao, Hui
    [J]. COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 532 - 549
  • [14] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
    Li, Xin
    Shi, Botian
    Hou, Yuenan
    Wu, Xingjiao
    Ma, Tianlong
    Li, Yikang
    He, Liang
    [J]. COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
  • [15] Voxel Field Fusion for 3D Object Detection
    Li, Yanwei
    Qi, Xiaojuan
    Chen, Yukang
    Wang, Liwei
    Li, Zeming
    Sun, Jian
    Jia, Jiaya
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1110 - 1119
  • [16] DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
    Li, Yingwei
    Yu, Adams Wei
    Meng, Tianjian
    Caine, Ben
    Ngiam, Jiquan
    Peng, Daiyi
    Shen, Junyang
    Lu, Yifeng
    Zhou, Denny
    Le, Quoc, V
    Yuille, Alan
    Tan, Mingxing
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17161 - 17170
  • [17] Multi-Task Multi-Sensor Fusion for 3D Object Detection
    Liang, Ming
    Yang, Bin
    Chen, Yun
    Hu, Rui
    Urtasun, Raquel
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7337 - 7345
  • [18] Lin YK, 2022, AAAI CONF ARTIF INTE, P1638
  • [19] Liu Jiaming, 2022, Advances in Neural Information Processing Systems
  • [20] EPNet plus plus : Cascade Bi-Directional Fusion for Multi-Modal 3D Object Detection
    Liu, Zhe
    Huang, Tengteng
    Li, Bingling
    Chen, Xiwu
    Wang, Xi
    Bai, Xiang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8324 - 8341