Dp-M3D: Monocular 3D object detection algorithm with depth perception capability

被引:13
作者
Shi, Peicheng [1 ]
Dong, Xinlong [1 ]
Ge, Runshuai [1 ]
Liu, Zhiqiang [2 ]
Yang, Aixi [3 ]
机构
[1] Anhui Polytech Univ, Sch Mech & Automot Engn, Wuhu 241000, Peoples R China
[2] Nanjing Univ Aeronaut & Astronaut, Sch Mech & Elect Engn, Nanjing 210000, Peoples R China
[3] Zhejiang Univ, Polytech Inst, Hangzhou 310015, Peoples R China
关键词
Object detection; Depth perception ability; Feature fusion; Feature enhancement; Non-maximal suppression;
D O I
10.1016/j.knosys.2025.113539
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Considering the limitations of monocular 3D object detection in depth information and perception ability, we introduce a novel monocular 3D object detection algorithm, Dp-M3D, equipped with depth perception capabilities. To effectively model long-range feature dependencies during the fusion of depth maps and image features, we introduce a Transformer Feature Fusion Encoder (TFFEn). TFFEn integrates depth and image features, enabling more comprehensive long-range feature modeling. This enhances depth perception, ultimately improving the accuracy of 3D object detection. To enhance the detection ability of truncated objects at the edges of an image, we propose a Feature Enhancement method based on Deformable Convolution (FEDC). FEDC leverages depth confidence guidance to determine the deformation offset of the 3D bounding box, aligning features more effectively and improving depth perception. Furthermore, to address the issue of anchor box ranking, where candidate boxes with accurate depth predictions but low classification confidence are suppressed, we propose a Depth-perception Non-Maximum Suppression (Dp-NMS) algorithm. Dp-NMS refines the selection process by incorporating the product of classification confidence and depth confidence, ensuring that candidate boxes are ranked effectively and the most suitable detection box is retained. Experimental results on the challenging KITTI 3D object detection dataset demonstrate that the proposed method achieves AP3D scores of 23.41 %, 13.65 %, and 12.91 % in the easy, moderate, and hard categories, respectively. Our approach outperforms state-of-the-art monocular 3D object detection algorithms based on image and image-depth map fusion, with particularly significant improvements in detecting edge-truncated objects.
引用
收藏
页数:14
相关论文
共 53 条
[51]   Monocular 3D Object Detection: An Extrinsic Parameter Free Approach [J].
Zhou, Yunsong ;
He, Yuan ;
Zhu, Hongzi ;
Wang, Cheng ;
Li, Hongyang ;
Jiang, Qinhong .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7552-7562
[52]   The Devil is in the Task: Exploiting Reciprocal Appearance-Localization Features for Monocular 3D Object Detection [J].
Zou, Zhikang ;
Ye, Xiaoqing ;
Du, Liang ;
Cheng, Xianhui ;
Tan, Xiao ;
Zhang, Li ;
Feng, Jianfeng ;
Xue, Xiangyang ;
Ding, Errui .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :2693-2702
[53]   TGADHead: An efficient and accurate task-guided attention-decoupled head for single-stage object detection [J].
Zuo, Fengyuan ;
Liu, Jinhai ;
Chen, Zhaolin ;
Fu, Mingrui ;
Wang, Lei .
KNOWLEDGE-BASED SYSTEMS, 2024, 302