Towards efficient multi-modal 3D object detection: Homogeneous sparse fuse network

被引：1

作者：

Tang, Yingjuan ^{[1
]}

He, Hongwen ^{[1
]}

Wang, Yong ^{[1
]}

Wu, Jingda ^{[2
]}

机构：

[1] Beijing Inst Technol, Sch Mech Engn, Beijing 100081, Peoples R China

[2] Nanyang Technol Univ, Sch Mech & Aerosp Engn, 50 Nanyang Ave, Singapore 639798, Singapore

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 256卷

关键词：

Autonomous driving; 3D object detection; Multi-modal; Sparse convolutional networks; Point cloud and image fusion; Homogeneous fusion;

D O I：

10.1016/j.eswa.2024.124945

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

LiDAR-only 3D detection methods struggle with the sparsity of point clouds. To overcome this issue, multi- modal methods have been proposed, but their fusion is a challenge due to the heterogeneous representation of images and point clouds. This paper proposes a novel multi-modal framework, Homogeneous Sparse Fusion (HS-Fusion), which generates pseudo point clouds from depth completion. The proposed framework introduces a 3D foreground-aware middle extractor that efficiently extracts high-responding foreground features from sparse point cloud data. This module can be integrated into existing sparse convolutional neural networks. Furthermore, the proposed homogeneous attentive fusion enables cross-modality consistency fusion. Finally, the proposed HS-Fusion can simultaneously combine 2D image features and 3D geometric features of pseudo point clouds using multi-representation feature extraction. The proposed network has been found to attain better performance on the 3D object detection benchmarks. In particular, the proposed model demonstrates a 4.02% improvement in accuracy compared to the pure model. Moreover, its inference speed surpasses that of other models, thus further validating the efficacy of HS-Fusion.

引用

页数：12

共 52 条

[11] Ku J, 2018, IEEE INT C INT ROBOT, P5750, DOI 10.1109/IROS.2018.8594049
[12] PointPillars: Fast Encoders for Object Detection from Point Clouds
Lang, Alex H.
Vora, Sourabh
Caesar, Holger
Zhou, Lubing
Yang, Jiong
Beijbom, Oscar
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697
[13] Enhancing Multi-modal Features Using Local Self-attention for 3D Object Detection
Li, Hao
Zhang, Zehan
Zhao, Xian
Wang, Yulong
Shen, Yuxi
Pu, Shiliang
Mao, Hui
[J]. COMPUTER VISION, ECCV 2022, PT X, 2022, 13670 : 532 - 549
[14] Homogeneous Multi-modal Feature Fusion and Interaction for 3D Object Detection
Li, Xin
Shi, Botian
Hou, Yuenan
Wu, Xingjiao
Ma, Tianlong
Li, Yikang
He, Liang
[J]. COMPUTER VISION, ECCV 2022, PT XXXVIII, 2022, 13698 : 691 - 707
[15] Voxel Field Fusion for 3D Object Detection
Li, Yanwei
Qi, Xiaojuan
Chen, Yukang
Wang, Liwei
Li, Zeming
Sun, Jian
Jia, Jiaya
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1110 - 1119
[16] DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection
Li, Yingwei
Yu, Adams Wei
Meng, Tianjian
Caine, Ben
Ngiam, Jiquan
Peng, Daiyi
Shen, Junyang
Lu, Yifeng
Zhou, Denny
Le, Quoc, V
Yuille, Alan
Tan, Mingxing
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 17161 - 17170
[17] Multi-Task Multi-Sensor Fusion for 3D Object Detection
Liang, Ming
Yang, Bin
Chen, Yun
Hu, Rui
Urtasun, Raquel
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7337 - 7345
[18] Lin YK, 2022, AAAI CONF ARTIF INTE, P1638
[19] Liu Jiaming, 2022, Advances in Neural Information Processing Systems
[20] EPNet plus plus : Cascade Bi-Directional Fusion for Multi-Modal 3D Object Detection
Liu, Zhe
Huang, Tengteng
Li, Bingling
Chen, Xiwu
Wang, Xi
Bai, Xiang
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (07) : 8324 - 8341

← 1 2 3 4 5 6 →