Monocular 3D Object Detection With Motion Feature Distillation

被引:2
作者
Hu, Henan [1 ,2 ]
Li, Muyu [3 ]
Zhu, Ming [1 ]
Gao, Wen [4 ]
Liu, Peiyu [5 ]
Chan, Kwok-Leung [6 ]
机构
[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130033, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Ctr Intelligent Multidimens Data Anal Ltd, Hong Kong, Peoples R China
[4] BYD Auto Ind Co Ltd, Shenzhen 518119, Peoples R China
[5] Shenyang Aircraft Design & Res Inst, Shenyang 110036, Liaoning, Peoples R China
[6] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
关键词
Three-dimensional displays; Object detection; Feature extraction; Estimation; Location awareness; Image resolution; Solid modeling; 3D object detection; bird's-eye-view (BEV); monocular depth estimation; motion feature; knowledge distillation; autonomous driving;
D O I
10.1109/ACCESS.2023.3300708
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of autonomous driving, environmental perception within a 360-degree field of view is extremely important. This can be achieved via the detection of three-dimensional (3D) objects in the surrounding scene with the inputs acquired by sensors such as LiDAR or RGB camera. The 3D perception generated is commonly represented as the bird's-eye-view (BEV) of the sensor. RGB camera has the advantages of low-cost and long-range acquisition. As the RGB images are two-dimensional (2D), the BEV generated from 2D images suffers from low accuracy due to limitations such as lack of temporal correlation. To address the problems, we propose a monocular 3D object detection method based on long short-term feature fusion and motion feature distillation. Long short-term temporal features are extracted with different feature map resolutions. The motion features and depth information are combined and encoded using an encoder based on the Transformer cross-correlation module, and further integrated into the BEV space of fused long short-term temporal features. Subsequently, a decoder with motion feature distillation is used to localize objects in 3D space. By combining BEV feature representations of different time steps, and supplemented with embedded motion features and depth information, our proposed method significantly improves the accuracy of monocular 3D object detection as demonstrated from experimental results obtained on nuScenes dataset. Our proposed method outperforms state-of-the-art methods, in particular the previous best art by 6.7% on mAP, and 8.3% on mATE.
引用
收藏
页码:82933 / 82945
页数:13
相关论文
共 50 条
  • [31] MonoDFM: Density Field Modeling-Based End-to-End Monocular 3D Object Detection
    Liu, Gang
    Huang, Xinrui
    Xie, Xiaoxiao
    IEEE ACCESS, 2025, 13 : 74015 - 74031
  • [32] Super Sparse 3D Object Detection
    Fan, Lue
    Yang, Yuxue
    Wang, Feng
    Wang, Naiyan
    Zhang, Zhaoxiang
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12490 - 12505
  • [33] Voxel-to-Pillar: Knowledge Distillation of 3D Object Detection in Point Cloud
    Zhang, Jinbao
    Liu, Jun
    PROCEEDINGS OF THE 4TH EUROPEAN SYMPOSIUM ON SOFTWARE ENGINEERING, ESSE 2023, 2024, : 99 - 104
  • [34] GPro3D: Deriving 3D BBox from ground plane in monocular 3D object detection
    Yang, Fan
    Xu, Xinhao
    Chen, Hui
    Guo, Yuchen
    He, Yuwei
    Ni, Kai
    Ding, Guiguang
    NEUROCOMPUTING, 2023, 562
  • [35] Point-Guided Contrastive Learning for Monocular 3-D Object Detection
    Feng, Dapeng
    Han, Songfang
    Xu, Hang
    Liang, Xiaodan
    Tan, Xiaojun
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (02) : 954 - 966
  • [36] MonoSG: Monocular 3D Object Detection With Stereo Guidance
    Fan, Zhiwei
    Xu, Chao
    Chu, Minghang
    Huang, Yuling
    Ma, Yaoyao
    Wang, Jing
    Xu, Yishen
    Wu, Di
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (04): : 3604 - 3611
  • [37] Fine-Grained Multilevel Fusion for Anti-Occlusion Monocular 3D Object Detection
    Liu, He
    Liu, Huaping
    Wang, Yikai
    Sun, Fuchun
    Huang, Wenbing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 4050 - 4061
  • [38] Shape Prior Guided Instance Disparity Estimation for 3D Object Detection
    Chen, Linghao
    Sun, Jiaming
    Xie, Yiming
    Zhang, Siyu
    Shuai, Qing
    Jiang, Qinhong
    Zhang, Guofeng
    Bao, Hujun
    Zhou, Xiaowei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (09) : 5529 - 5540
  • [39] Depth-Vision-Decoupled Transformer With Cascaded Group Convolutional Attention for Monocular 3-D Object Detection
    Xu, Yan
    Wang, Haoyuan
    Ji, Zhong
    Zhang, Qiyuan
    Jia, Qian
    Li, Xuening
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [40] Diversity Knowledge Distillation for LiDAR-Based 3-D Object Detection
    Ning, Kanglin
    Liu, Yanfei
    Su, Yanzhao
    Jiang, Ke
    IEEE SENSORS JOURNAL, 2023, 23 (11) : 11181 - 11193