MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation

被引:0
作者
Chenxing Xia
Wenjun Zhao
Huidan Han
Zhanpeng Tao
Bin Ge
Xiuju Gao
Kuan-Ching Li
Yan Zhang
机构
[1] Anhui University of Science and Technology,College of Computer Science and Engineering
[2] Institute of Energy,College of Electrical and Information Engineering
[3] Hefei Comprehensive National Science Center,Department of Computer Science and Information Engineering
[4] Anhui Purvar Bigdata Technology Co. Ltd,The School of Electronics and Information Engineering
[5] Anyang Cigarette Factory,undefined
[6] China Tobacco Henan Industrial Co.,undefined
[7] Ltd.,undefined
[8] Anhui University of Science and Technology,undefined
[9] Providence University,undefined
[10] Anhui University,undefined
来源
Journal of Intelligent & Robotic Systems | 2024年 / 110卷
关键词
Monocular 3D object detection; Deep learning; Depth estimation; Autonomous driving;
D O I
暂无
中图分类号
学科分类号
摘要
Monocular 3D object detection (Mono3OD) is a challenging yet cost-effective vision task in the fields of autonomous driving and mobile robotics. The lack of reliable depth information makes obtaining accurate 3D positional information extremely difficult. In recent years, center-guided monocular 3D object detectors have directly regressed the absolute depth of the object center based on 2D detection. However, this approach heavily relies on local semantic information, ignoring contextual spatial cues and global-to-local visual correlations. Moreover, visual variations in the scene can lead to inevitable depth prediction errors for objects at different scales. To address these limitations, we propose a Mono3OD framework based on scene-level adaptive instance depth estimation (MonoSAID). Firstly, the continuous depth is discretized into multiple bins, and the width distribution of depth bins is adaptively generated based on scene-level contextual semantic information. Then, by establishing the correlation between global contextual semantic feature information and local semantic features of instances, and using the probability distribution representation of local instance features and the linear combination of bin centers distributions to solve the depth problem. In addition, a multi-scale spatial perception attention module is designed to extract attention maps of various scales through pyramid pooling operations. This design enhances the model’s receptive field and multi-scale spatial perception capabilities, thereby improving its ability to model target objects. We conducted extensive experiments on the KITTI dataset and the Waymo dataset. The results show that MonoSAID can effectively improve the 3D detection accuracy and robustness, and our method achieves state-of-the-art performance.
引用
收藏
相关论文
共 50 条
  • [31] MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods
    Pan, Huihui
    Jia, Yisong
    Wang, Jue
    Sun, Weichao
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3574 - 3587
  • [32] Monocular 3D Object Detection With Motion Feature Distillation
    Hu, Henan
    Li, Muyu
    Zhu, Ming
    Gao, Wen
    Liu, Peiyu
    Chan, Kwok-Leung
    IEEE ACCESS, 2023, 11 : 82933 - 82945
  • [33] 3D Visual Object Detection from Monocular Images
    Wang, Qiaosong
    Rasmussen, Christopher
    ADVANCES IN VISUAL COMPUTING, ISVC 2019, PT I, 2020, 11844 : 168 - 180
  • [34] Shape-Aware Monocular 3D Object Detection
    Chen, Wei
    Zhao, Jie
    Zhao, Wan-Lei
    Wu, Song-Yuan
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (06) : 6416 - 6424
  • [35] Multi-Scale Enhanced Depth Knowledge Distillation for Monocular 3D Object Detection with SEFormer
    Zhang, Han
    Li, Jun
    Tang, Rui
    Shi, Zhiping
    Bu, Aojie
    2023 IEEE INTERNATIONAL CONFERENCES ON INTERNET OF THINGS, ITHINGS IEEE GREEN COMPUTING AND COMMUNICATIONS, GREENCOM IEEE CYBER, PHYSICAL AND SOCIAL COMPUTING, CPSCOM IEEE SMART DATA, SMARTDATA AND IEEE CONGRESS ON CYBERMATICS,CYBERMATICS, 2024, : 38 - 43
  • [36] 3D Street Object Detection from Monocular Images Using Deep Learning and Depth Information
    Liu, Wei
    Zhang, Tao
    Ma, Yun
    Wei, Longsheng
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2023, 27 (02) : 198 - 206
  • [37] A Kinect-Based 3D Object Detection and Recognition System with Enhanced Depth Estimation Algorithm
    Elaraby, Ahmed Fawzy
    Hamdy, Ayman
    Rehan, Mohamed
    2018 IEEE 9TH ANNUAL INFORMATION TECHNOLOGY, ELECTRONICS AND MOBILE COMMUNICATION CONFERENCE (IEMCON), 2018, : 247 - 252
  • [38] GPro3D: Deriving 3D BBox from ground plane in monocular 3D object detection
    Yang, Fan
    Xu, Xinhao
    Chen, Hui
    Guo, Yuchen
    He, Yuwei
    Ni, Kai
    Ding, Guiguang
    NEUROCOMPUTING, 2023, 562
  • [39] FCOS3Dformer: enhancing monocular 3D object detection through transformer-assisted fusion of depth information
    Hao, Bingsen
    Deng, Zhaoxue
    Liu, Mingze
    Liu, Can
    International Journal of Vehicle Systems Modelling and Testing, 2024, 18 (03) : 228 - 244
  • [40] 3D object detection based on point cloud in automatic driving scene
    Li, Hai-Sheng
    Lu, Yan-Ling
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (05) : 13029 - 13044