3D Object Detection Method with Image Semantic Feature Guidance and Cross-Modal Fusion of Point Cloud

被引:0
作者
Li, Hui [1 ]
Wang, Junyin [1 ]
Cheng, Yuanzhi [2 ]
Liu, Jian [3 ]
Zhao, Guowei [1 ]
Chen, Shuangmin [1 ]
机构
[1] School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao
[2] Faculty of Computing, Harbin Institute of Technology, Harbin
[3] College of Computer Science, Nankai University, Tianjin
来源
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics | 2024年 / 36卷 / 05期
关键词
3D object detection; anchor-free; cross-modal; point cloud; semantic feature;
D O I
10.3724/SP.J.1089.2024.19862
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the complexity of scenes, the influence of object scale changes and occlusions etc., object detection still face many challenges. Cross-modal feature fusion of image and laser point cloud information can effectively improve the performance of 3D object detection, but the fusion effect and detection performance still need to be improved. Therefore, this paper first designs an image semantic feature learning network, which adopts a position and channel dual-branch self-attention parallel computing method, achieves global semantic enhancement, to reduce target misclassification. Secondly, a local semantic fusion module with image semantic feature guidance is proposed, which uses element-level data splicing to guide and fuse point cloud data with the local semantic features of the retrieved images, so as to better solve the problem of semantic alignment in cross-modal information fusion. A multi-scale re-fusion network is proposed, and the interaction module between the fusion features and LiDAR is designed to learn multi-scale connections in fusion features and re-fusion between features of different resolutions, so as to improve the detection performance. Finally, four task losses are adopted to perform anchor-free 3D multi-object detector. Comparing with other methods in KITTI and nuScenes datasets, the detection accuracy for 3D objects is 87.15%, and the experimental results show that the method in this paper outperforms the comparison methods and has better 3D detection performance. © 2024 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:734 / 749
页数:15
相关论文
共 62 条
[21]  
Ercelik E, Yurtsever E, Liu M Y, Et al., 3D object detection with a self-supervised Lidar scene flow backbone, Proceedings of the 17th European Conference on Computer Vision, pp. 247-265, (2022)
[22]  
Liu Z, Zhao X, Huang T T, Et al., TANet: robust 3D object detection from point clouds with triple attention, Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 11677-11684, (2020)
[23]  
Lan Y Q, Duan Y, Liu C Y, Et al., ARM3D: attention-based relation module for indoor 3D object detection, Computational Visual Media, 8, 3, pp. 395-414, (2022)
[24]  
Qi C R, Chen X L, Litany O, Et al., ImVoteNet: boosting 3D object detection in point clouds with image votes, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4403-4412, (2020)
[25]  
Xu D F, Anguelov D, Jain A., PointFusion: deep sensor fusion for 3D bounding box estimation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 244-253, (2018)
[26]  
Vora S, Lang A H, Helou B, Et al., PointPainting: sequential fusion for 3D object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4603-4611, (2020)
[27]  
Yin T W, Zhou X Y, Krahenbuhl P., Multimodal virtual point 3D detection
[28]  
Zhang Z H, Zhang M, Liang Z D, Et al., MAFF-Net: filter false positive for 3D vehicle detection with multi-modal adaptive feature fusion
[29]  
Li R H, Li X Z, Heng P A, Et al., PointAugment: an auto-augmentation framework for point cloud classification, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 6377-6386, (2020)
[30]  
Tan X, Chen X Y, Zhang G W, Et al., MBDF-Net: multi-branch deep fusion network for 3D object detection, Proceedings of the 1st International Workshop on Multimedia Computing for Urban Data, pp. 9-17, (2021)