Ensemble Knowledge Distillation for Learning Improved and Efficient Networks

被引:13
作者
Asif, Umar [1 ]
Tang, Jianbin [1 ]
Harrer, Stefan [1 ]
机构
[1] IBM Res Australia, Southbank, Vic, Australia
来源
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2020年 / 325卷
关键词
D O I
10.3233/FAIA200188
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Ensemble models comprising of deep Convolutional Neural Networks (CNN) have shown significant improvements in model generalization but at the cost of large computation and memory requirements. In this paper, we present a framework for learning compact CNN models with improved classification performance and model generalization. For this, we propose a CNN architecture of a compact student model with parallel branches which are trained using ground truth labels and information from high capacity teacher networks in an ensemble learning fashion. Our framework provides two main benefits: i) Distilling knowledge from different teachers into the student network promotes heterogeneity in learning features at different branches of the student network and enables the network to learn diverse solutions to the target problem. ii) Coupling the branches of the student network through ensembling encourages collaboration and improves the quality of the final predictions by reducing variance in the network outputs. Experiments on the well established CIFAR-10 and CIFAR-100 datasets show that our Ensemble Knowledge Distillation (EKD) improves classification accuracy and model generalization especially in situations with limited training data. Experiments also show that our EKD based compact networks outperform in terms of mean accuracy on the test datasets compared to other knowledge distillation based methods.
引用
收藏
页码:953 / 960
页数:8
相关论文
共 27 条
  • [1] [Anonymous], 2019, CoRR
  • [2] [Anonymous], 2016, INT C LEARNING REPRE
  • [3] [Anonymous], 2017, ARXIV170404861
  • [4] [Anonymous], 2018, P 35 INT C MACHINE L
  • [5] [Anonymous], 2016, ARXIV160207360
  • [6] [Anonymous], 2015, ICLR
  • [7] Ba LJ, 2014, ADV NEUR IN, V27
  • [8] Bergstra J., 2011, ADV NEURAL INFORM PR, P2546, DOI DOI 10.5555/2986459.2986743
  • [9] Cai H., 2018, P INT C LEARN REPR
  • [10] DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs
    Chen, Liang-Chieh
    Papandreou, George
    Kokkinos, Iasonas
    Murphy, Kevin
    Yuille, Alan L.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (04) : 834 - 848