UniDistill: A Universal Cross-Modality Knowledge Distillation Framework for 3D Object Detection in Bird's-Eye View

被引:34
作者
Zhou, Shengchao [1 ]
Liu, Weizhou [1 ]
Hu, Chen [1 ]
Zhou, Shuchang [1 ]
Ma, Chao [2 ]
机构
[1] MEGVII Technol, Beijing, Peoples R China
[2] Shanghai Jiao Tong Univ, AI Inst, MoE Key Lab Artificial Intelligence, Shanghai, Peoples R China
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR | 2023年
关键词
D O I
10.1109/CVPR52729.2023.00495
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the field of 3D object detection for autonomous driving, the sensor portfolio including multi-modality and single-modality is diverse and complex. Since the multimodal methods have system complexity while the accuracy of single-modal ones is relatively low, how to make a trade-off between them is difficult. In this work, we propose a universal cross-modality knowledge distillation framework (UniDistill) to improve the performance of single-modality detectors. Specifically, during training, UniDistill projects the features of both the teacher and the student detector into Bird's-Eye-View (BEV), which is a friendly representation for different modalities. Then, three distillation losses are calculated to sparsely align the foreground features, helping the student learn from the teacher without introducing additional cost during inference. Taking advantage of the similar detection paradigm of different detectors in BEV, UniDistill easily supports LiDAR-to-camera, camera-to-LiDAR, fusion-to-LiDAR and fusion-to-camera distillation paths. Furthermore, the three distillation losses can filter the effect of misaligned background information and balance between objects of different sizes, improving the distillation effectiveness. Extensive experiments on nuScenes demonstrate that UniDistill effectively improves the mAP and NDS of student detectors by 2.0%similar to 3.2%.
引用
收藏
页码:5116 / 5125
页数:10
相关论文
共 47 条
[1]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]  
Chen GB, 2017, ADV NEUR IN, V30
[4]  
Chen Q., 2020, Adv. Neural Inf. Process. Syst., V33, P21224
[5]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[6]  
Chong Zhiyu, 2021, INT C LEARN REPR
[7]   General Instance Distillation for Object Detection [J].
Dai, Xing ;
Jiang, Zeren ;
Wu, Zhao ;
Bao, Yiping ;
Wang, Zhicheng ;
Liu, Si ;
Zhou, Erjin .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :7838-7847
[8]   A Versatile Multi-View Framework for LiDAR-based 3D Object Detection with Guidance from Panoptic Segmentation [J].
Fazlali, Hamidreza ;
Xu, Yixuan ;
Ren, Yuan ;
Liu, Bingbing .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :17171-17180
[9]   Distilling Object Detectors via Decoupled Features [J].
Guo, Jianyuan ;
Han, Kai ;
Wang, Yunhe ;
Wu, Han ;
Chen, Xinghao ;
Xu, Chunjing ;
Xu, Chang .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :2154-2164
[10]  
Hinton G.E., 2015, Distilling the Knowledge in a Neural Network