Monocular 3D Object Detection With Motion Feature Distillation

被引:2
作者
Hu, Henan [1 ,2 ]
Li, Muyu [3 ]
Zhu, Ming [1 ]
Gao, Wen [4 ]
Liu, Peiyu [5 ]
Chan, Kwok-Leung [6 ]
机构
[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130033, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Ctr Intelligent Multidimens Data Anal Ltd, Hong Kong, Peoples R China
[4] BYD Auto Ind Co Ltd, Shenzhen 518119, Peoples R China
[5] Shenyang Aircraft Design & Res Inst, Shenyang 110036, Liaoning, Peoples R China
[6] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
关键词
Three-dimensional displays; Object detection; Feature extraction; Estimation; Location awareness; Image resolution; Solid modeling; 3D object detection; bird's-eye-view (BEV); monocular depth estimation; motion feature; knowledge distillation; autonomous driving;
D O I
10.1109/ACCESS.2023.3300708
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of autonomous driving, environmental perception within a 360-degree field of view is extremely important. This can be achieved via the detection of three-dimensional (3D) objects in the surrounding scene with the inputs acquired by sensors such as LiDAR or RGB camera. The 3D perception generated is commonly represented as the bird's-eye-view (BEV) of the sensor. RGB camera has the advantages of low-cost and long-range acquisition. As the RGB images are two-dimensional (2D), the BEV generated from 2D images suffers from low accuracy due to limitations such as lack of temporal correlation. To address the problems, we propose a monocular 3D object detection method based on long short-term feature fusion and motion feature distillation. Long short-term temporal features are extracted with different feature map resolutions. The motion features and depth information are combined and encoded using an encoder based on the Transformer cross-correlation module, and further integrated into the BEV space of fused long short-term temporal features. Subsequently, a decoder with motion feature distillation is used to localize objects in 3D space. By combining BEV feature representations of different time steps, and supplemented with embedded motion features and depth information, our proposed method significantly improves the accuracy of monocular 3D object detection as demonstrated from experimental results obtained on nuScenes dataset. Our proposed method outperforms state-of-the-art methods, in particular the previous best art by 6.7% on mAP, and 8.3% on mATE.
引用
收藏
页码:82933 / 82945
页数:13
相关论文
共 50 条
  • [41] MonoSIM: Simulating Learning Behaviors of Heterogeneous Point Cloud Object Detectors for Monocular 3-D Object Detection
    Sun, Han
    Fan, Zhaoxin
    Song, Zhenbo
    Wang, Zhicheng
    Wu, Kejian
    Lu, Jianfeng
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [42] SCNet3D: Rethinking the Feature Extraction Process of Pillar-Based 3D Object Detection
    Li, Junru
    Wang, Zhiling
    Gong, Diancheng
    Wang, Chunchun
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (01) : 770 - 784
  • [43] High-Order Structural Relation Distillation Networks From LiDAR to Monocular Image 3D Detectors
    Yan, Weiqing
    Xu, Long
    Liu, Hao
    Tang, Chang
    Zhou, Wujie
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (02): : 3593 - 3604
  • [44] MonoLI: Precise Monocular 3-D Object Detection for Next-Generation Consumer Electronics for Autonomous Electric Vehicles
    Gao, Honghao
    Yu, Xinxin
    Xu, Yueshen
    Kim, Jung Yoon
    Wang, Ye
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 3475 - 3486
  • [45] 3D Object Detection With Multi-Frame RGB-Lidar Feature Alignment
    Ercelik, Emec
    Yurtsever, Ekim
    Knoll, Alois
    IEEE ACCESS, 2021, 9 : 143138 - 143149
  • [46] MonoPSTR: Monocular 3-D Object Detection With Dynamic Position and Scale-Aware Transformer
    Yang, Fan
    He, Xuan
    Chen, Wenrui
    Zhou, Pengjie
    Li, Zhiyong
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [47] Transformer3D-Det: Improving 3D Object Detection by Vote Refinement
    Zhao, Lichen
    Guo, Jinyang
    Xu, Dong
    Sheng, Lu
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) : 4735 - 4746
  • [48] Depth-Guided Vision Transformer With Normalizing Flows for Monocular 3D Object Detection
    Pan, Cong
    Peng, Junran
    Zhang, Zhaoxiang
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2024, 11 (03) : 673 - 689
  • [49] Weakly Supervised Monocular 3D Object Detection by Spatial-Temporal View Consistency
    Han, Wencheng
    Tao, Runzhou
    Ling, Haibin
    Shen, Jianbing
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2025, 47 (01) : 84 - 98
  • [50] PillarFocusNet for 3D object detection with perceptual diffusion and key feature understanding
    Yuhan Gao
    Peng Wang
    Xiaoyan Li
    Bo Sun
    Mengyu Sun
    Liangliang Li
    Ruohai Di
    Scientific Reports, 15 (1)