Structured Knowledge Distillation for Accurate and Efficient Object Detection

被引:20
作者
Zhang, Linfeng [1 ]
Ma, Kaisheng [1 ]
机构
[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing 100084, Peoples R China
关键词
Attention; instance segmentation; knowledge distillation; model acceleration and compression; non-local module; object detection; student-teacher learning;
D O I
10.1109/TPAMI.2023.3300470
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation, which aims to transfer the knowledge learned by a cumbersome teacher model to a lightweight student model, has become one of the most popular and effective techniques in computer vision. However, many previous knowledge distillation methods are designed for image classification and fail in more challenging tasks such as object detection. In this paper, we first suggest that the failure of knowledge distillation on object detection is mainly caused by two reasons: (1) the imbalance between pixels of foreground and background and (2) lack of knowledge distillation on the relation among different pixels. Then, we propose a structured knowledge distillation scheme, including attention-guided distillation and non-local distillation to address the two issues, respectively. Attention-guided distillation is proposed to find the crucial pixels of foreground objects with an attention mechanism and then make the students take more effort to learn their features. Non-local distillation is proposed to enable students to learn not only the feature of an individual pixel but also the relation between different pixels captured by non-local modules. Experimental results have demonstrated the effectiveness of our method on thirteen kinds of object detection models with twelve comparison methods for both object detection and instance segmentation. For instance, Faster RCNN with our distillation achieves 43.9 mAP on MS COCO2017, which is 4.1 higher than the baseline. Additionally, we show that our method is also beneficial to the robustness and domain generalization ability of detectors. Codes and model weights have been released on GitHub(1).
引用
收藏
页码:15706 / 15724
页数:19
相关论文
共 148 条
[1]   Variational Information Distillation for Knowledge Transfer [J].
Ahn, Sungsoo ;
Hu, Shell Xu ;
Damianou, Andreas ;
Lawrence, Neil D. ;
Dai, Zhenwen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9155-9163
[2]  
Anil R, 2020, Arxiv, DOI arXiv:1804.03235
[3]  
Bagherinezhad H, 2018, Arxiv, DOI arXiv:1805.02641
[4]   Efficient Video Classification Using Fewer Frames [J].
Bhardwaj, Shweta ;
Srinivasan, Mukundhan ;
Khapra, Mitesh M. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :354-363
[5]   Structural Knowledge Distillation for Efficient Skeleton-Based Action Recognition [J].
Bian, Cunling ;
Feng, Wei ;
Wan, Liang ;
Wang, Song .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 :2963-2976
[6]   YOLACT Real-time Instance Segmentation [J].
Bolya, Daniel ;
Zhou, Chong ;
Xiao, Fanyi ;
Lee, Yong Jae .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9156-9165
[7]   A non-local algorithm for image denoising [J].
Buades, A ;
Coll, B ;
Morel, JM .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 2, PROCEEDINGS, 2005, :60-65
[8]  
Bucilua Cristian, 2006, P 12 ACM SIGKDD INT, P535, DOI DOI 10.1145/1150402.1150464
[9]   Cascade R-CNN: High Quality Object Detection and Instance Segmentation [J].
Cai, Zhaowei ;
Vasconcelos, Nuno .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (05) :1483-1498
[10]   GCNet: Non-local Networks Meet Squeeze-Excitation Networks and Beyond [J].
Cao, Yue ;
Xu, Jiarui ;
Lin, Stephen ;
Wei, Fangyun ;
Hu, Han .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1971-1980