Balanced knowledge distillation for long-tailed learning

被引:43
作者
Zhang, Shaoyu [1 ,2 ]
Chen, Chen [1 ,2 ]
Hu, Xiyuan [3 ]
Peng, Silong [1 ,2 ,4 ]
机构
[1] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Nanjing Univ Sci & Technol, Nanjing, Peoples R China
[4] Beijing ViSystem Co Ltd, Beijing, Peoples R China
基金
美国国家科学基金会;
关键词
Long-tailed learning; Knowledge distillation; Vision and text classification; SMOTE;
D O I
10.1016/j.neucom.2023.01.063
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep models trained on long-tailed datasets exhibit unsatisfactory performance on tail classes. Existing methods usually modify the classification loss to increase the learning focus on tail classes, which unex-pectedly sacrifice the performance on head classes. In fact, this scheme leads to a contradiction between the two goals of long-tailed learning, i.e., learning generalizable representations and facilitating learning for tail classes. In this work, we explore knowledge distillation in long-tailed scenarios and propose a novel distillation framework, named Balanced Knowledge Distillation (BKD), to disentangle the contradic-tion between the two goals and achieve both simultaneously. Specifically, given a teacher model, we train the student model by minimizing the combination of an instance-balanced classification loss and a class-balanced distillation loss. The former benefits from the sample diversity and learns generalizable repre-sentation, while the latter considers the class priors and facilitates learning for tail classes. We conduct extensive experiments on several long-tailed benchmark datasets and demonstrate that the proposed BKD is an effective knowledge distillation framework in long-tailed scenarios, as well as a competitive method for long-tailed learning. Our source code is available: https://github.com/EricZsy/ BalancedKnowledgeDistillation.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页码:36 / 46
页数:11
相关论文
共 64 条
[1]  
[Anonymous], 2000, INT C MACH LEARN
[2]   A systematic study of the class imbalance problem in convolutional neural networks [J].
Buda, Mateusz ;
Maki, Atsuto ;
Mazurowski, Maciej A. .
NEURAL NETWORKS, 2018, 106 :249-259
[3]  
Byrd J, 2019, PR MACH LEARN RES, V97
[4]   ACE: Ally Complementary Experts for Solving Long-Tailed Recognition in One-Shot [J].
Cai, Jiarui ;
Wang, Yizhou ;
Hwang, Jenq-Neng .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :112-121
[5]  
Cao K., Advances in neural information processing systems, V32
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   A knowledge-guide hierarchical learning method for long-tailed image classification [J].
Chen, Qiong ;
Liu, Qingfa ;
Lin, Enlu .
NEUROCOMPUTING, 2021, 459 :408-418
[8]   Feature Space Augmentation for Long-Tailed Data [J].
Chu, Peng ;
Bian, Xiao ;
Liu, Shaopeng ;
Ling, Haibin .
COMPUTER VISION - ECCV 2020, PT XXIX, 2020, 12374 :694-710
[9]   Class-Balanced Loss Based on Effective Number of Samples [J].
Cui, Yin ;
Jia, Menglin ;
Lin, Tsung-Yi ;
Song, Yang ;
Belongie, Serge .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :9260-9269
[10]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848