MonoSAID: Monocular 3D Object Detection based on Scene-Level Adaptive Instance Depth Estimation

被引:0
作者
Chenxing Xia
Wenjun Zhao
Huidan Han
Zhanpeng Tao
Bin Ge
Xiuju Gao
Kuan-Ching Li
Yan Zhang
机构
[1] Anhui University of Science and Technology,College of Computer Science and Engineering
[2] Institute of Energy,College of Electrical and Information Engineering
[3] Hefei Comprehensive National Science Center,Department of Computer Science and Information Engineering
[4] Anhui Purvar Bigdata Technology Co. Ltd,The School of Electronics and Information Engineering
[5] Anyang Cigarette Factory,undefined
[6] China Tobacco Henan Industrial Co.,undefined
[7] Ltd.,undefined
[8] Anhui University of Science and Technology,undefined
[9] Providence University,undefined
[10] Anhui University,undefined
来源
Journal of Intelligent & Robotic Systems | 2024年 / 110卷
关键词
Monocular 3D object detection; Deep learning; Depth estimation; Autonomous driving;
D O I
暂无
中图分类号
学科分类号
摘要
Monocular 3D object detection (Mono3OD) is a challenging yet cost-effective vision task in the fields of autonomous driving and mobile robotics. The lack of reliable depth information makes obtaining accurate 3D positional information extremely difficult. In recent years, center-guided monocular 3D object detectors have directly regressed the absolute depth of the object center based on 2D detection. However, this approach heavily relies on local semantic information, ignoring contextual spatial cues and global-to-local visual correlations. Moreover, visual variations in the scene can lead to inevitable depth prediction errors for objects at different scales. To address these limitations, we propose a Mono3OD framework based on scene-level adaptive instance depth estimation (MonoSAID). Firstly, the continuous depth is discretized into multiple bins, and the width distribution of depth bins is adaptively generated based on scene-level contextual semantic information. Then, by establishing the correlation between global contextual semantic feature information and local semantic features of instances, and using the probability distribution representation of local instance features and the linear combination of bin centers distributions to solve the depth problem. In addition, a multi-scale spatial perception attention module is designed to extract attention maps of various scales through pyramid pooling operations. This design enhances the model’s receptive field and multi-scale spatial perception capabilities, thereby improving its ability to model target objects. We conducted extensive experiments on the KITTI dataset and the Waymo dataset. The results show that MonoSAID can effectively improve the 3D detection accuracy and robustness, and our method achieves state-of-the-art performance.
引用
收藏
相关论文
共 50 条
  • [41] 3D object detection based on point cloud in automatic driving scene
    Hai-Sheng Li
    Yan-Ling Lu
    [J]. Multimedia Tools and Applications, 2024, 83 : 13029 - 13044
  • [42] Lite-FPN for keypoint-based monocular 3D object detection
    Yang, Lei
    Zhang, Xinyu
    Li, Jun
    Wang, Li
    Zhu, Minghan
    Zhu, Lei
    [J]. KNOWLEDGE-BASED SYSTEMS, 2023, 271
  • [43] A Survey on Deep Learning Based Methods and Datasets for Monocular 3D Object Detection
    Kim, Seong-heum
    Hwang, Youngbae
    [J]. ELECTRONICS, 2021, 10 (04) : 1 - 22
  • [44] GAC3D: improving monocular 3D object detection with ground-guide model and adaptive convolution
    Bui, Minh-Quan Viet
    Ngo, Duc Tuan
    Pham, Hoang-Anh
    Nguyen, Duc Dung
    [J]. PEERJ COMPUTER SCIENCE, 2021, 7
  • [45] Stereoscopic Vision Recalling Memory for Monocular 3D Object Detection
    Kim, Jung Uk
    Kim, Hyung-Il
    Ro, Yong Man
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2749 - 2760
  • [46] MonoFENet: Monocular 3D Object Detection With Feature Enhancement Networks
    Bao, Wentao
    Xu, Bin
    Chen, Zhenzhong
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2020, 29 : 2753 - 2765
  • [47] A Mobile 3-D Object Recognition Processor With Deep-Learning-Based Monocular Depth Estimation
    Im, Dongseok
    Park, Gwangtae
    Li, Zhiyong
    Ryu, Junha
    Kang, Sanghoon
    Han, Donghyeon
    Lee, Jinsu
    Park, Wonhoon
    Kwon, Hankyul
    Yoo, Hoi-Jun
    [J]. IEEE MICRO, 2023, 43 (03) : 74 - 82
  • [48] Monocular 3D Object Detection via Geometric Reasoning on Keypoints
    Barabanau, Ivan
    Artemov, Alexey
    Burnaev, Evgeny
    Murashkin, Vyacheslav
    [J]. PROCEEDINGS OF THE 15TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS, VOL 5: VISAPP, 2020, : 652 - 659
  • [49] MDS-Net: Multi-Scale Depth Stratification 3D Object Detection from Monocular Images
    Xie, Zhouzhen
    Song, Yuying
    Wu, Jingxuan
    Li, Zecheng
    Song, Chunyi
    Xu, Zhiwei
    [J]. SENSORS, 2022, 22 (16)
  • [50] Monocular 3D Object Detection Using Feature Map Transformation: Towards Learning Perspective-Invariant Scene Representations
    Schroeder, Enrico
    Maehlisch, Mirko
    Vitay, Julien
    Hamker, Fred
    [J]. 2020 FOURTH IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING (IRC 2020), 2020, : 383 - 390