Monocular 3D Object Detection With Motion Feature Distillation

被引：2

作者：

Hu, Henan ^{[1
,2
]}

Li, Muyu ^{[3
]}

Zhu, Ming ^{[1
]}

Gao, Wen ^{[4
]}

Liu, Peiyu ^{[5
]}

Chan, Kwok-Leung ^{[6
]}

机构：

[1] Chinese Acad Sci, Changchun Inst Opt Fine Mech & Phys, Changchun 130033, Peoples R China

[2] Univ Chinese Acad Sci, Beijing 100049, Peoples R China

[3] Ctr Intelligent Multidimens Data Anal Ltd, Hong Kong, Peoples R China

[4] BYD Auto Ind Co Ltd, Shenzhen 518119, Peoples R China

[5] Shenyang Aircraft Design & Res Inst, Shenyang 110036, Liaoning, Peoples R China

[6] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China

来源：

IEEE ACCESS | 2023年 / 11卷

关键词：

Three-dimensional displays; Object detection; Feature extraction; Estimation; Location awareness; Image resolution; Solid modeling; 3D object detection; bird's-eye-view (BEV); monocular depth estimation; motion feature; knowledge distillation; autonomous driving;

D O I：

10.1109/ACCESS.2023.3300708

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In the context of autonomous driving, environmental perception within a 360-degree field of view is extremely important. This can be achieved via the detection of three-dimensional (3D) objects in the surrounding scene with the inputs acquired by sensors such as LiDAR or RGB camera. The 3D perception generated is commonly represented as the bird's-eye-view (BEV) of the sensor. RGB camera has the advantages of low-cost and long-range acquisition. As the RGB images are two-dimensional (2D), the BEV generated from 2D images suffers from low accuracy due to limitations such as lack of temporal correlation. To address the problems, we propose a monocular 3D object detection method based on long short-term feature fusion and motion feature distillation. Long short-term temporal features are extracted with different feature map resolutions. The motion features and depth information are combined and encoded using an encoder based on the Transformer cross-correlation module, and further integrated into the BEV space of fused long short-term temporal features. Subsequently, a decoder with motion feature distillation is used to localize objects in 3D space. By combining BEV feature representations of different time steps, and supplemented with embedded motion features and depth information, our proposed method significantly improves the accuracy of monocular 3D object detection as demonstrated from experimental results obtained on nuScenes dataset. Our proposed method outperforms state-of-the-art methods, in particular the previous best art by 6.7% on mAP, and 8.3% on mATE.

引用

页码：82933 / 82945

页数：13

共 50 条

[21] Ground-Aware Monocular 3D Object Detection for Autonomous Driving
Liu, Yuxuan
Yixuan, Yuan
Liu, Ming
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02): : 919 - 926
[22] Adaptive Feature Aggregation Centric Enhance Network for Accurate and Fast Monocular 3-D Object Detection
Lin, Peng-Wei
Hsu, Chih-Ming
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2024, 73
[23] BEV-LGKD: A Unified LiDAR-Guided Knowledge Distillation Framework for Multi-View BEV 3D Object Detection
Li, Jianing
Lu, Ming
Liu, Jiaming
Guo, Yandong
Du, Yuan
Du, Li
Zhang, Shanghang
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 2489 - 2498
[24] OPA-3D: Occlusion-Aware Pixel-Wise Aggregation for Monocular 3D Object Detection
Su, Yongzhi
Di, Yan
Zhai, Guangyao
Manhardt, Fabian
Rambach, Jason
Busam, Benjamin
Stricker, Didier
Tombari, Federico
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (03) : 1327 - 1334
[25] You Only Look Bottom-Up for Monocular 3D Object Detection
Xiong, Kaixin
Zhang, Dingyuan
Liang, Dingkang
Liu, Zhe
Yang, Hongcheng
Dikubab, Wondimu
Cheng, Jianwei
Bai, Xiang
IEEE ROBOTICS AND AUTOMATION LETTERS, 2023, 8 (11) : 7464 - 7471
[26] CL3D: Camera-LiDAR 3D Object Detection With Point Feature Enhancement and Point-Guided Fusion
Lin, Chunmian
Tian, Daxin
Duan, Xuting
Zhou, Jianshan
Zhao, Dezong
Cao, Dongpu
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (10) : 18040 - 18050
[27] Uncertainty Prediction for Monocular 3D Object Detection
Mun, Junghwan
Choi, Hyukdoo
SENSORS, 2023, 23 (12)
[28] Monocular 3D object detection for distant objects
Li, Jiahao
Han, Xiaohong
JOURNAL OF ELECTRONIC IMAGING, 2024, 33 (03) : 33021
[29] MonoAMNet: Three-Stage Real-Time Monocular 3D Object Detection With Adaptive Methods
Pan, Huihui
Jia, Yisong
Wang, Jue
Sun, Weichao
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (03) : 3574 - 3587
[30] SSD-MonoDETR: Supervised Scale-Aware Deformable Transformer for Monocular 3D Object Detection
He, Xuan
Yang, Fan
Yang, Kailun
Lin, Jiacheng
Fu, Haolong
Wang, Meng
Yuan, Jin
Li, Zhiyong
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2024, 9 (01): : 555 - 567

← 1 2 3 4 5 →