Deep Mutual Learning

被引:1307
作者
Zhang, Ying [1 ,2 ]
Xiang, Tao [2 ]
Hospedales, Timothy M. [3 ]
Lu, Huchuan [1 ]
机构
[1] Dalian Univ Technol, Dalian, Peoples R China
[2] Queen Mary Univ London, London, England
[3] Univ Edinburgh, Edinburgh, Midlothian, Scotland
来源
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2018年
关键词
D O I
10.1109/CVPR.2018.00454
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Model distillation is an effective and widely used technique to transfer knowledge from a teacher to a student network. The typical application is to transfer from a powerful large network or ensemble to a small network, in order to meet the low-memory or fast execution requirements. In this paper, we present a deep mutual learning (DML) strategy. Different from the one-way transfer between a static pre-defined teacher and a student in model distillation, with DML, an ensemble of students learn collaboratively and teach each other throughout the training process. Our experiments show that a variety of network architectures benefit from mutual learning and achieve compelling results on both category and instance recognition tasks. Surprisingly, it is revealed that no prior powerful teacher network is necessary mutual learning of a collection of simple student networks works, and moreover outperforms distillation from a more powerful yet static teacher.
引用
收藏
页码:4320 / 4328
页数:9
相关论文
共 35 条
[1]  
[Anonymous], 2014, NIPS
[2]  
[Anonymous], 2017, ICCV
[3]  
[Anonymous], 2017, ICCV
[4]  
[Anonymous], 2015, JMLR
[5]  
[Anonymous], 2017, ICLR
[6]  
[Anonymous], PROC CVPR IEEE
[7]  
[Anonymous], 2017, ICLR
[8]  
[Anonymous], 2006, KDD
[9]  
[Anonymous], 2015, ICLR
[10]  
[Anonymous], 2017, CVPR