Monocular 3D Object Detection for Autonomous Driving Based on Contextual Transformer

被引:0
作者
She, Xiangyang [1 ]
Yan, Weijia [1 ]
Dong, Lihong [1 ]
机构
[1] College of Computer Science and Technology, Xi'an University of Science and Technology, Xi'an
关键词
autonomous driving; Contextual Transformer; coordinate attention mechanism; monocular 3D object detection; multi-scale perception;
D O I
10.3778/j.issn.1002-8331.2307-0084
中图分类号
学科分类号
摘要
Aiming at the current problems of leakage and poor multi-scale target detection in monocular 3D object detection, a monocular 3D object detection algorithm for autonomous driving based on Contextual Transformer (CM-RTM3D) is proposed. Firstly, Contextual Transformer (CoT) is introduced into the ResNet-50 network to construct the ResNet-Transformer architecture for feature extraction. Secondly, the multi-scale spatial perception (MSP) module is designed to improve the loss of shallow features through scale-space response operations, embedding the coordinate attention mechanism (CA) along both horizontal and vertical spatial directions, and generating soft weights of importance at each scale using the softmax function. Finally, the Huber loss function is used instead of the L1 loss function in the offset loss. The experimental results show that, compared with the RTM3D algorithm on the KITTI autopilot dataset, the algorithm in this paper improves AP3D by 4.84, 3.82, and 5.36 percentage points, and APBEV by 4.75, 6.26, and 3.56 percentage points, respectively, at the three difficulty levels of easy, medium, and difficult. © 2024 Journal of Computer Engineering and Applications Beijing Co., Ltd.; Science Press. All rights reserved.
引用
收藏
页码:178 / 189
页数:11
相关论文
共 23 条
[1]  
READING C, HARAKEH A, CHAE J, Et al., Categorical depth distribution network for monocular 3D object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8555-8564, (2021)
[2]  
DING M, HUO Y, YI H, Et al., Learning depth-guided convolutions for monocular 3D object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 1000-1001, (2020)
[3]  
LU Y, MA X, YANG L, Et al., Geometry uncertainty projection network for monocular 3D object detection, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3111-3121, (2021)
[4]  
KU J, PON A, WASLANDER S., Monocular 3D object detection leveraging accurate proposals and shape reconstruction, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11867-11876, (2019)
[5]  
LI B, OUYANG W, SHENG L, Et al., GS3D: an efficient 3D object detection framework for autonomous driving, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1019-1028, (2019)
[6]  
CHEN Y, TAI L, SUN K, Et al., Monopair: monocular 3D object detection using pairwise spatial relationships, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12093-12102, (2020)
[7]  
CHABOT F, CHAOUCH M, RABARISOA J, Et al., Deep manta: a coarse-to-fine many-task network for joint 2D and 3D vehicle analysis from monocular image, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2040-2049, (2017)
[8]  
LI P, ZHAO H, LIU P, Et al., RTM3D: real-time monocular 3D detection from object keypoints for autonomous driving, Proceedings of the European Conference on Computer Vision, pp. 644-660, (2020)
[9]  
MA X, ZHANG Y, XU D, Et al., Delving into localization errors for monocular 3D object detection, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4721-4730, (2021)
[10]  
ZHOU X, WANG D, KRAHENBUHL P., Objects as points [J], (2019)