UNIC: Universal Classification Models via Multi-teacher Distillation

被引：0

作者：

Sariyildiz, Mert Bulent ^{[1
]}

Weinzaepfel, Philippe ^{[1
]}

Lucas, Thomas ^{[1
]}

Larlus, Diane ^{[1
]}

Kalantidis, Yannis ^{[1
]}

机构：

[1] NAVER LABS Europe, Meylan, France

来源：

COMPUTER VISION-ECCV 2024, PT IV | 2025年 / 15062卷

关键词：

Multi-Teacher Distillation; Classification; Generalization; KNOWLEDGE DISTILLATION; ENSEMBLE;

D O I：

10.1007/978-3-031-73235-5_20

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Pretrained models have become a commodity and offer strong results on a broad range of tasks. In this work, we focus on classification and seek to learn a unique encoder able to take from several complementary pretrained models. We aim at even stronger generalization across a variety of classification tasks. We propose to learn such an encoder via multi-teacher distillation. We first thoroughly analyze standard distillation when driven by multiple strong teachers with complementary strengths. Guided by this analysis, we gradually propose improvements to the basic distillation setup. Among those, we enrich the architecture of the encoder with a ladder of expendable projectors, which increases the impact of intermediate features during distillation, and we introduce teacher dropping, a regularization mechanism that better balances the teachers' influence. Our final distillation strategy leads to student models of the same capacity as any of the teachers, while retaining or improving upon the performance of the best teacher for each task.

引用

页码：353 / 371

页数：19

共 69 条

[1] Variational Information Distillation for Knowledge Transfer
Ahn, Sungsoo
Hu, Shell Xu
Damianou, Andreas
Lawrence, Neil D.
Dai, Zhenwen
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9155 - 9163
[2] Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
Asif, Umar
Tang, Jianbin
Harrer, Stefan
[J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 953 - 960
[3] Ba J., 2013, P NEURIPS
[4] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
[5] Bucilu a C., 2006, P SIGKDD
[6] Emerging Properties in Self-Supervised Vision Transformers
Caron, Mathilde
Touvron, Hugo
Misra, Ishan
Jegou, Herve
Mairal, Julien
Bojanowski, Piotr
Joulin, Armand
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
[7] Chen T, 2020, PR MACH LEARN RES, V119
[8] Exploring Simple Siamese Representation Learning
Chen, Xinlei
He, Kaiming
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15745 - 15753
[9] Chen Z, 2018, PR MACH LEARN RES, V80
[10] Describing Textures in the Wild
Cimpoi, Mircea
Maji, Subhransu
Kokkinos, Iasonas
Mohamed, Sammy
Vedaldi, Andrea
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3606 - 3613

← 1 2 3 4 5 6 7 →