A Multimodal 3D Object Detection Method Based on Double-Fusion Framework

被引：0

作者：

Ge T.-A. ^{[1
]}

Li H. ^{[1
]}

Guo Y. ^{[1
]}

Wang J.-Y. ^{[2
]}

Zhou D. ^{[1
]}

机构：

[1] School of Data Science, Qingdao University of Science and Technology, Shandong, Qingdao

[2] School of Computer Science and Artificial Intelligence, Wuhan University of Technology, Hubei, Wuhan

来源：

Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2023年 / 51卷 / 11期

基金：

中国国家自然科学基金;

关键词：

3D object detection; camera; deep learning; LiDAR; multimodal information fusion;

D O I：

10.12263/DZXB.20230414

中图分类号：

学科分类号：

摘要：

The 3D object detection of camera and lidar multimodal fusion can comprehensively utilize the advantages of the two sensors to improve the accuracy and robustness of detection. However, due to the complexity of the environment and the inherent variability among multimodal data, 3D object detection still faces many challenges. In this paper, we pro⁃ pose a multimodal 3D object detection algorithm with a double-fusion framework. We design a voxel-level and grid-level double-fusion framework, effectively alleviating the semantic differences between modal data. We propose the ABFF (Adaptive Bird-eye-view Features Fusion) module to enhance the algorithm's ability to perceive small object features. Through voxel-level global fusion information to guide grid-level local fusion, we propose a Transformer-based multimodal grid feature encoder to extract richer context information in 3D detection scenes and improve the efficiency of the algo⁃ rithm. The experimental results on the KITTI standard dataset show that the average detection accuracy of our proposed 3D object detection algorithm reaches 78.79%, which has better 3D object detection performance. © 2023 Chinese Institute of Electronics. All rights reserved.

引用

页码：3100 / 3110

页数：10

共 35 条

[1]

YAN Y, MAO Y X, LI B., SECOND: Sparsely embedded convolutional detection, Sensors, 18, 10, (2018)

[2]

SHI S S, GUO C X, JIANG L, Et al., PV-RCNN: Point-vox⁃ el feature set abstraction for 3D object detection, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 10529-10538, (2020)

[3]

DENG J J, SHI S S, LI P W, Et al., Voxel R-CNN: Towards high performance voxel-based 3D object detection, Pro⁃ ceedings of the AAAI Conference on Artificial Intelli⁃ gence, 35, 2, pp. 1201-1209, (2021)

[4]

ZHENG W, TANG W L, JIANG L, Et al., SE-SSD: Self-en⁃ sembling single-stage object detector from point cloud, 2021 IEEE/CVF Conference on Computer Vision and Pat⁃ tern Recognition (CVPR), pp. 14494-14503, (2021)

[5]

HU J S K, KUAI T S, WASLANDER S L., Point density-aware voxels for LiDAR 3D object detection, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8469-8478, (2022)

[6]

WU H, DENG J H, WEN C L, Et al., CasA: A cascade at⁃ tention network for 3-D object detection from LiDAR point clouds, IEEE Transactions on Geoscience and Re⁃ mote Sensing, 60, pp. 1-11, (2022)

[7]

PHILION J, FIDLER S., Lift, splat, shoot: Encoding imag⁃ es from arbitrary camera rigs by implicitly unprojecting to 3D, Computer Vision — ECCV 2020, pp. 194-210, (2020)

[8]

LI Y H, GE Z, YU G Y, Et al., BEVDepth: Acquisition of reliable depth for multi-view 3D object detection, Pro⁃ ceedings of the AAAI Conference on Artificial Intelli⁃ gence, 37, 2, pp. 1477-1485, (2023)

[9]

LI Z Q, WANG W H, LI H Y, Et al., BEVFormer: Learning Bird’s-eye-view representation from multi-camera images via spatiotemporal Transformers, Lecture Notes in Computer Science, pp. 1-18, (2022)

[10]

VORA S, LANG A H, HELOU B, Et al., Pointpainting: Sequential fusion for 3D object detection, 2020 IEEE/ CVF Conference on Computer Vision and Pattern Recog⁃ nition (CVPR), pp. 4604-4612, (2020)

← 1 2 3 4 →