Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network

被引：1

作者：

Tang, Yingjuan ^{[1
]}

He, Hongwen ^{[1
]}

Wang, Yong ^{[1
]}

Wu, Jingda ^{[2
]}

机构：

[1] Beijing Inst Technol, Sch Mech Engn, Beijing 100081, Peoples R China

[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, 50 Nanyang Ave, Singapore 639798, Singapore

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 256卷

关键词：

Autonomous driving; 3D object detection; Multi-modal; Sparse convolutional networks; Point cloud and image fusion; Homogeneous fusion;

D O I：

10.1016/j.eswa.2024.124945

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi- modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.

引用

页数：12

共 52 条

[1] Multi-View 3D Object Detection Network for Autonomous Driving
Chen, Xiaozhi
Ma, Huimin
Wan, Ji
Li, Bo
Xia, Tian
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
[2] Focal Sparse Convolutional Networks for 3D Object Detection
Chen, Yukang
Li, Yanwei
Zhang, Xiangyu
Sun, Jian
Jia, Jiaya
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 5418 - 5427
[3] Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
[4] 4D Reconstruction of the Past
Doulamis, Anastasios
Ioannides, Marinos
Doulamis, Nikolaos
Hadjiprocopis, Andreas
Fritsch, Dieter
Balet, Olivier
Julien, Martine
Protopapadakis, Eftychios
Makantasis, Kostas
Weinlinger, Guenther
Johnsons, Paul S.
Klein, Michael
Fellner, Dieter
Stork, Andre
Santos, Pedro
[J]. FIRST INTERNATIONAL CONFERENCE ON REMOTE SENSING AND GEOINFORMATION OF THE ENVIRONMENT (RSCY2013), 2013, 8795
[5] Eldesokey A, 2020, PROC CVPR IEEE, P12011, DOI 10.1109/CVPR42600.2020.01203
[6] Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[7] 3D Semantic Segmentation with Submanifold Sparse Convolutional Networks
Graham, Benjamin
Engelcke, Martin
van der Maaten, Laurens
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 9224 - 9232
[8] DenseLiDAR: A Real-Time Pseudo Dense Depth Guided Depth Completion Network
Gu, Jiaqi
Xiang, Zhiyu
Ye, Yuwen
Wang, Lingxuan
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 1808 - 1815
[9] Depth Completion with Twin Surface Extrapolation at Occlusion Boundaries
Imran, Saif
Liu, Xiaoming
Morris, Daniel
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2583 - 2592
[10] Deep learning-based dynamic object classification using LiDAR point cloud augmented by layer-based accumulation for intelligent vehicles
Kim, Kyungpyo
Kim, Chansoo
Jang, Chulhoon
Sunwoo, Myoungho
Jo, Kichun
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 167

← 1 2 3 4 5 6 →