Online Knowledge Distillation via Collaborative Learning

被引：217

作者：

Guo, Qiushan ^{[1
]}

Wang, Xinjiang ^{[2
]}

Wu, Yichao ^{[2
]}

Yu, Zhipeng ^{[2
]}

Liang, Ding ^{[2
]}

Hu, Xiaolin ^{[3
]}

Luo, Ping ^{[4
]}

机构：

[1] Beijing Univ Posts & Telecommun, Beijing, Peoples R China

[2] SenseTime Grp Ltd, Hong Kong, Peoples R China

[3] Tsinghua Univ, Beijing, Peoples R China

[4] Univ Hong Kong, Hong Kong, Peoples R China

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年

基金：

中国国家自然科学基金;

关键词：

NEURAL-NETWORKS;

D O I：

10.1109/CVPR42600.2020.01103

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work presents an efficient yet effective online Knowledge Distillation method via Collaborative Learning, termed KDCL, which is able to consistently improve the generalization ability of deep neural networks (DNNs) that have different learning capacities. Unlike existing two-stage knowledge distillation approaches that pre-train a DNN with large capacity as the "teacher" and then transfer the teacher's knowledge to another "student" DNN unidirectionally (i.e. one-way), KDCL treats all DNNs as "students" and collaboratively trains them in a single stage (knowledge is transferred among arbitrary students during collaborative training), enabling parallel computing, fast computations, and appealing generalization ability. Specifically, we carefully design multiple methods to generate soft target as supervisions by effectively ensembling predictions of students and distorting the input images. Extensive experiments show that KDCL consistently improves all the "students" on different datasets, including CIFAR-100 and ImageNet. For example, when trained together by using KDCL, ResNet-50 and MobileNetV2 achieve 78.2% and 74.0% top-1 accuracy on ImageNet, outperforming the original results by 1.4% and 2.0% respectively. We also verify that models pre-trained with KDCL transfer well to object detection and semantic segmentation on MS COCO dataset. For instance, the FPN detector is improved by 0.9% mAP.

引用

页码：11017 / 11026

页数：10

共 33 条

[1]

Anil R., 2018, ARXIV

[2]

Ba J., 2014, Advances in neural information processing systems

[3]

Bagherinezhad H., 2018, CoRR abs/1805.02641

[4]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[5]

DeVries T., 2017, Improved regulariza

[6]

Furlanello T, 2018, ARXIV

[7]

He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[8] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[9] Knowledge Distillation with Adversarial Samples Supporting Decision Boundary [J].

Heo, Byeongho ;

Lee, Minsik ;

Yun, Sangdoo ;

Choi, Jin Young .

THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, :3771-3778

[10]

Hinton G., 2015, ARXIV

← 1 2 3 4 →