MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation

被引：0

作者：

Chenxing Xia

Wenjun Zhao

Huidan Han

Zhanpeng Tao

Bin Ge

Xiuju Gao

Kuan-Ching Li

Yan Zhang

机构：

[1] Anhui University of Science and Technology,College of Computer Science and Engineering

[2] Institute of Energy,College of Electrical and Information Engineering

[3] Hefei Comprehensive National Science Center,Department of Computer Science and Information Engineering

[4] Anhui Purvar Bigdata Technology Co. Ltd,The School of Electronics and Information Engineering

[5] Anyang Cigarette Factory,undefined

[6] China Tobacco Henan Industrial Co.,undefined

[7] Ltd.,undefined

[8] Anhui University of Science and Technology,undefined

[9] Providence University,undefined

[10] Anhui University,undefined

来源：

Journal of Intelligent & Robotic Systems | 2024年 / 110卷

关键词：

Monocular 3D object detection; Deep learning; Depth estimation; Autonomous driving;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Monocular 3D object detection (Mono3OD) is a challenging yet cost-effective vision task in the fields of autonomous driving and mobile robotics. The lack of reliable depth information makes obtaining accurate 3D positional information extremely difficult. In recent years, center-guided monocular 3D object detectors have directly regressed the absolute depth of the object center based on 2D detection. However, this approach heavily relies on local semantic information, ignoring contextual spatial cues and global-to-local visual correlations. Moreover, visual variations in the scene can lead to inevitable depth prediction errors for objects at different scales. To address these limitations, we propose a Mono3OD framework based on scene-level adaptive instance depth estimation (MonoSAID). Firstly, the continuous depth is discretized into multiple bins, and the width distribution of depth bins is adaptively generated based on scene-level contextual semantic information. Then, by establishing the correlation between global contextual semantic feature information and local semantic features of instances, and using the probability distribution representation of local instance features and the linear combination of bin centers distributions to solve the depth problem. In addition, a multi-scale spatial perception attention module is designed to extract attention maps of various scales through pyramid pooling operations. This design enhances the model’s receptive field and multi-scale spatial perception capabilities, thereby improving its ability to model target objects. We conducted extensive experiments on the KITTI dataset and the Waymo dataset. The results show that MonoSAID can effectively improve the 3D detection accuracy and robustness, and our method achieves state-of-the-art performance.

引用

共 50 条

[21] Monocular 3D Object Detection With Sequential Feature Association and Depth Hint Augmentation
Gao, Tianze
Pan, Huihui
Gao, Huijun
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2022, 7 (02): : 240 - 250
[22] Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth
Chen, Yi-Rong
Tseng, Ching-Yu
Liou, Yi-Syuan
Wu, Tsung-Han
Hsu, Winston H.
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[23] Uncertainty Prediction for Monocular 3D Object Detection
Mun, Junghwan
Choi, Hyukdoo
SENSORS, 2023, 23 (12)
[24] Monocular 3D object detection for distant objects
Li, Jiahao
Han, Xiaohong
JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33021
[25] Multi-Modal Fusion Based on Depth Adaptive Mechanism for 3D Object Detection
Liu, Zhanwen
Cheng, Juanru
Fan, Jin
Lin, Shan
Wang, Yang
Zhao, Xiangmo
IEEE TRANSACTIONS ON MULTIMEDIA, 2025, 27 : 707 - 717
[26] Monocular 3D object detection via estimation of paired keypoints for autonomous driving
Chaofeng Ji
Guizhong Liu
Dan Zhao
Multimedia Tools and Applications, 2022, 81 : 5973 - 5988
[27] Monocular 3D object detection via estimation of paired keypoints for autonomous driving
Ji, Chaofeng
Liu, Guizhong
Zhao, Dan
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (04) : 5973 - 5988
[28] MonoEF: Extrinsic Parameter Free Monocular 3D Object Detection
Zhou, Yunsong
He, Yuan
Zhu, Hongzi
Wang, Cheng
Li, Hongyang
Jiang, Qinhong
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) : 10114 - 10128
[29] One Stage Monocular 3D Object Detection Utilizing Discrete Depth and Orientation Representation
Haq, Muhamad Amirul
Ruan, Shanq-Jang
Shao, Mei-En
ul Haq, Qazi Mazhar
Liang, Pei-Jung
Gao, De-Qin
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (11) : 21630 - 21640
[30] MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods
Pan, Huihui
Jia, Yisong
Wang, Jue
Sun, Weichao
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3574 - 3587

← 1 2 3 4 5 →