Monocular 3D Object Detection With Motion Feature Distillation

被引:2
作者
Hu, Henan [1 ,2 ]
Li, Muyu [3 ]
Zhu, Ming [1 ]
Gao, Wen [4 ]
Liu, Peiyu [5 ]
Chan, Kwok-Leung [6 ]
机构
[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130033, Peoples R China
[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China
[3] Ctr Intelligent Multidimens Data Anal Ltd, Hong Kong, Peoples R China
[4] BYD Auto Ind Co Ltd, Shenzhen 518119, Peoples R China
[5] Shenyang Aircraft Design & Res Inst, Shenyang 110036, Liaoning, Peoples R China
[6] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
关键词
Three-dimensional displays; Object detection; Feature extraction; Estimation; Location awareness; Image resolution; Solid modeling; 3D object detection; bird's-eye-view (BEV); monocular depth estimation; motion feature; knowledge distillation; autonomous driving;
D O I
10.1109/ACCESS.2023.3300708
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the context of autonomous driving, environmental perception within a 360-degree field of view is extremely important. This can be achieved via the detection of three-dimensional (3D) objects in the surrounding scene with the inputs acquired by sensors such as LiDAR or RGB camera. The 3D perception generated is commonly represented as the bird's-eye-view (BEV) of the sensor. RGB camera has the advantages of low-cost and long-range acquisition. As the RGB images are two-dimensional (2D), the BEV generated from 2D images suffers from low accuracy due to limitations such as lack of temporal correlation. To address the problems, we propose a monocular 3D object detection method based on long short-term feature fusion and motion feature distillation. Long short-term temporal features are extracted with different feature map resolutions. The motion features and depth information are combined and encoded using an encoder based on the Transformer cross-correlation module, and further integrated into the BEV space of fused long short-term temporal features. Subsequently, a decoder with motion feature distillation is used to localize objects in 3D space. By combining BEV feature representations of different time steps, and supplemented with embedded motion features and depth information, our proposed method significantly improves the accuracy of monocular 3D object detection as demonstrated from experimental results obtained on nuScenes dataset. Our proposed method outperforms state-of-the-art methods, in particular the previous best art by 6.7% on mAP, and 8.3% on mATE.
引用
收藏
页码:82933 / 82945
页数:13
相关论文
共 50 条
  • [21] Ground-Aware Monocular 3D Object Detection for Autonomous Driving
    Liu, Yuxuan
    Yixuan, Yuan
    Liu, Ming
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 919 - 926
  • [22] Adaptive Feature Aggregation Centric Enhance Network for Accurate and Fast Monocular 3-D Object Detection
    Lin, Peng-Wei
    Hsu, Chih-Ming
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
  • [23] BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for Multi-View BEV 3D Object Detection
    Li, Jianing
    Lu, Ming
    Liu, Jiaming
    Guo, Yandong
    Du, Yuan
    Du, Li
    Zhang, Shanghang
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 2489 - 2498
  • [24] OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection
    Su, Yongzhi
    Di, Yan
    Zhai, Guangyao
    Manhardt, Fabian
    Rambach, Jason
    Busam, Benjamin
    Stricker, Didier
    Tombari, Federico
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (03) : 1327 - 1334
  • [25] You Only Look Bottom-Up for Monocular 3D Object Detection
    Xiong, Kaixin
    Zhang, Dingyuan
    Liang, Dingkang
    Liu, Zhe
    Yang, Hongcheng
    Dikubab, Wondimu
    Cheng, Jianwei
    Bai, Xiang
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11) : 7464 - 7471
  • [26] CL3D: Camera-LiDAR 3D Object Detection With Point Feature Enhancement and Point-Guided Fusion
    Lin, Chunmian
    Tian, Daxin
    Duan, Xuting
    Zhou, Jianshan
    Zhao, Dezong
    Cao, Dongpu
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 18040 - 18050
  • [27] Uncertainty Prediction for Monocular 3D Object Detection
    Mun, Junghwan
    Choi, Hyukdoo
    SENSORS, 2023, 23 (12)
  • [28] Monocular 3D object detection for distant objects
    Li, Jiahao
    Han, Xiaohong
    JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33021
  • [29] MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods
    Pan, Huihui
    Jia, Yisong
    Wang, Jue
    Sun, Weichao
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3574 - 3587
  • [30] SSD-MonoDETR: Supervised Scale-Aware Deformable Transformer for Monocular 3D Object Detection
    He, Xuan
    Yang, Fan
    Yang, Kailun
    Lin, Jiacheng
    Fu, Haolong
    Wang, Meng
    Yuan, Jin
    Li, Zhiyong
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 555 - 567