UNIC: Universal Classification Models via Multi-teacher Distillation

被引:0
作者
Sariyildiz, Mert Bulent [1 ]
Weinzaepfel, Philippe [1 ]
Lucas, Thomas [1 ]
Larlus, Diane [1 ]
Kalantidis, Yannis [1 ]
机构
[1] NAVER LABS Europe, Meylan, France
来源
COMPUTER VISION-ECCV 2024, PT IV | 2025年 / 15062卷
关键词
Multi-Teacher Distillation; Classification; Generalization; KNOWLEDGE DISTILLATION; ENSEMBLE;
D O I
10.1007/978-3-031-73235-5_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Pretrained models have become a commodity and offer strong results on a broad range of tasks. In this work, we focus on classification and seek to learn a unique encoder able to take from several complementary pretrained models. We aim at even stronger generalization across a variety of classification tasks. We propose to learn such an encoder via multi-teacher distillation. We first thoroughly analyze standard distillation when driven by multiple strong teachers with complementary strengths. Guided by this analysis, we gradually propose improvements to the basic distillation setup. Among those, we enrich the architecture of the encoder with a ladder of expendable projectors, which increases the impact of intermediate features during distillation, and we introduce teacher dropping, a regularization mechanism that better balances the teachers' influence. Our final distillation strategy leads to student models of the same capacity as any of the teachers, while retaining or improving upon the performance of the best teacher for each task.
引用
收藏
页码:353 / 371
页数:19
相关论文
共 69 条
  • [1] Variational Information Distillation for Knowledge Transfer
    Ahn, Sungsoo
    Hu, Shell Xu
    Damianou, Andreas
    Lawrence, Neil D.
    Dai, Zhenwen
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 9155 - 9163
  • [2] Ensemble Knowledge Distillation for Learning Improved and Efficient Networks
    Asif, Umar
    Tang, Jianbin
    Harrer, Stefan
    [J]. ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 953 - 960
  • [3] Ba J., 2013, P NEURIPS
  • [4] Bossard L, 2014, LECT NOTES COMPUT SC, V8694, P446, DOI 10.1007/978-3-319-10599-4_29
  • [5] Bucilu a C., 2006, P SIGKDD
  • [6] Emerging Properties in Self-Supervised Vision Transformers
    Caron, Mathilde
    Touvron, Hugo
    Misra, Ishan
    Jegou, Herve
    Mairal, Julien
    Bojanowski, Piotr
    Joulin, Armand
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 9630 - 9640
  • [7] Chen T, 2020, PR MACH LEARN RES, V119
  • [8] Exploring Simple Siamese Representation Learning
    Chen, Xinlei
    He, Kaiming
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15745 - 15753
  • [9] Chen Z, 2018, PR MACH LEARN RES, V80
  • [10] Describing Textures in the Wild
    Cimpoi, Mircea
    Maji, Subhransu
    Kokkinos, Iasonas
    Mohamed, Sammy
    Vedaldi, Andrea
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 3606 - 3613