Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network

被引:1
作者
Tang, Yingjuan [1 ]
He, Hongwen [1 ]
Wang, Yong [1 ]
Wu, Jingda [2 ]
机构
[1] Beijing Inst Technol, Sch Mech Engn, Beijing 100081, Peoples R China
[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, 50 Nanyang Ave, Singapore 639798, Singapore
关键词
Autonomous driving; 3D object detection; Multi-modal; Sparse convolutional networks; Point cloud and image fusion; Homogeneous fusion;
D O I
10.1016/j.eswa.2024.124945
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi- modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.
引用
收藏
页数:12
相关论文
共 52 条
  • [1] Multi-View 3D Object Detection Network for Autonomous Driving
    Chen, Xiaozhi
    Ma, Huimin
    Wan, Ji
    Li, Bo
    Xia, Tian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
  • [2] Focal Sparse Convolutional Networks for 3D Object Detection
    Chen, Yukang
    Li, Yanwei
    Zhang, Xiangyu
    Sun, Jian
    Jia, Jiaya
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5418 - 5427
  • [3] Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
  • [4] 4D Reconstruction of the Past
    Doulamis, Anastasios
    Ioannides, Marinos
    Doulamis, Nikolaos
    Hadjiprocopis, Andreas
    Fritsch, Dieter
    Balet, Olivier
    Julien, Martine
    Protopapadakis, Eftychios
    Makantasis, Kostas
    Weinlinger, Guenther
    Johnsons, Paul S.
    Klein, Michael
    Fellner, Dieter
    Stork, Andre
    Santos, Pedro
    [J]. FIRST INTERNATIONAL CONFERENCE ON REMOTE SENSING AND GEOINFORMATION OF THE ENVIRONMENT (RSCY2013), 2013, 8795
  • [5] Eldesokey A, 2020, PROC CVPR IEEE, P12011, DOI 10.1109/CVPR42600.2020.01203
  • [6] Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
  • [7] 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
    Graham, Benjamin
    Engelcke, Martin
    van der Maaten, Laurens
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9224 - 9232
  • [8] DenseLiDAR: A Real-Time Pseudo Dense Depth Guided Depth Completion Network
    Gu, Jiaqi
    Xiang, Zhiyu
    Ye, Yuwen
    Wang, Lingxuan
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 1808 - 1815
  • [9] Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries
    Imran, Saif
    Liu, Xiaoming
    Morris, Daniel
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2583 - 2592
  • [10] Deep learning-based dynamic object classification using LiDAR point cloud augmented by layer-based accumulation for intelligent vehicles
    Kim, Kyungpyo
    Kim, Chansoo
    Jang, Chulhoon
    Sunwoo, Myoungho
    Jo, Kichun
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167