3D Object Detection Method with Image Semantic Feature Guidance and Cross-Modal Fusion of Point Cloud

被引：0

作者：

Li, Hui ^{[1
]}

Wang, Junyin ^{[1
]}

Cheng, Yuanzhi ^{[2
]}

Liu, Jian ^{[3
]}

Zhao, Guowei ^{[1
]}

Chen, Shuangmin ^{[1
]}

机构：

[1] School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao

[2] Faculty of Computing, Harbin Institute of Technology, Harbin

[3] College of Computer Science, Nankai University, Tianjin

来源：

Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics | 2024年 / 36卷 / 05期

关键词：

3D object detection; anchor-free; cross-modal; point cloud; semantic feature;

D O I：

10.3724/SP.J.1089.2024.19862

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the complexity of scenes, the influence of object scale changes and occlusions etc., object detection still face many challenges. Cross-modal feature fusion of image and laser point cloud information can effectively improve the performance of 3D object detection, but the fusion effect and detection performance still need to be improved. Therefore, this paper first designs an image semantic feature learning network, which adopts a position and channel dual-branch self-attention parallel computing method, achieves global semantic enhancement, to reduce target misclassification. Secondly, a local semantic fusion module with image semantic feature guidance is proposed, which uses element-level data splicing to guide and fuse point cloud data with the local semantic features of the retrieved images, so as to better solve the problem of semantic alignment in cross-modal information fusion. A multi-scale re-fusion network is proposed, and the interaction module between the fusion features and LiDAR is designed to learn multi-scale connections in fusion features and re-fusion between features of different resolutions, so as to improve the detection performance. Finally, four task losses are adopted to perform anchor-free 3D multi-object detector. Comparing with other methods in KITTI and nuScenes datasets, the detection accuracy for 3D objects is 87.15%, and the experimental results show that the method in this paper outperforms the comparison methods and has better 3D detection performance. © 2024 Institute of Computing Technology. All rights reserved.

引用

页码：734 / 749

页数：15

共 62 条

[1]

Zhang Yanyong, Zhang Sha, Zhang Yu, Et al., Multi modality fusion perception and computing in autonomous driving, Journal of Computer Research and Development, 57, 9, pp. 1781-1799, (2020)

[2]

Mousavian A, Anguelov D, Flynn J, Et al., 3D bounding box estimation using deep learning and geometry, Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 5632-5640, (2017)

[3]

Zhou Feng, Tao Chongben, Zhang Zufeng, Et al., 3D dynamic target detection algorithm based on voxel point cloud fusion, Journal of Computer-Aided Design and Computer Graphics, 34, 6, pp. 901-912, (2022)

[4]

Chen X Z, Ma H M, Wan J, Et al., Multi-view 3D object detection network for autonomous driving, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6526-6534, (2017)

[5]

Wang Y, Chao W L, Garg D, Et al., Pseudo-LiDAR from visual depth estimation: bridging the gap in 3D object detection for autonomous driving, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8437-8445, (2019)

[6]

Shi S S, Wang X G, Li H S., PointRCNN: 3D object proposal generation and detection from point cloud, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 770-779, (2019)

[7]

Qi C R, Liu W, Wu C X, Et al., Frustum pointnets for 3D object detection from RGB-D data, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 918-927, (2018)

[8]

Charles R Q, Su H, Kaichun M, Et al., PointNet: deep learning on point sets for 3D classification and segmentation, Proceedings of the 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 77-85, (2017)

[9]

Qi C R, Yi L, Su H, Et al., PointNet++: deep hierarchical feature learning on point sets in a metric space

[10]

Gu S, Zhang Y G, Tang J H, Et al., Road detection through CRF based LiDAR-camera fusion, Proceedings of the International Conference on Robotics and Automation, pp. 3832-3838, (2019)

← 1 2 3 4 5 6 7 →