Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

被引：0

作者：

Fang, Gongfan ^{[1
,4
]}

Bao, Yifan ^{[1
]}

Song, Jie ^{[1
]}

Wang, Xinchao ^{[2
]}

Xie, Donglin ^{[1
]}

Shen, Chengchao ^{[3
]}

Song, Mingli ^{[1
]}

机构：

[1] Zhejiang Univ, Hangzhou, Peoples R China

[2] Natl Univ Singapore, Singapore, Singapore

[3] Cent South Univ, Changsha, Peoples R China

[4] Alibaba Zhejiang Univ Joint Inst Frontier Technol, Hangzhou, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Knowledge distillation (KD) aims to craft a compact student model that imitates the behavior of a pre-trained teacher in a target domain. Prior KD approaches, despite their gratifying results, have largely relied on the premise that in-domain data is available to carry out the knowledge transfer. Such an assumption, unfortunately, in many cases violates the practical setting, since the original training data or even the data domain is often unreachable due to privacy or copyright reasons. In this paper, we attempt to tackle an ambitious task, termed as out-of-domain knowledge distillation (OOD-KD), which allows us to conduct KD using only OOD data that can be readily obtained at a very low cost. Admittedly, OOD-KD is by nature a highly challenging task due to the agnostic domain gap. To this end, we introduce a handy yet surprisingly efficacious approach, dubbed as MosaicKD. The key insight behind MosaicKD lies in that, samples from various domains share common local patterns, even though their global semantic may vary significantly; these shared local patterns, in turn, can be re-assembled analogous to mosaic tiling, to approximate the in-domain data and to further alleviating the domain discrepancy. In MosaicKD, this is achieved through a four-player min-max game, in which a generator, a discriminator, a student network, are collectively trained in an adversarial manner, partially under the guidance of a pre-trained teacher. We validate MosaicKD over classification and semantic segmentation tasks across various benchmarks, and demonstrate that it yields results much superior to the state-of-the-art counterparts on OOD data. Our code is available at https://github.com/zju-vipa/MosaicKD.

引用

页数：13

共 64 条

[1] Arjovsky M, 2017, PR MACH LEARN RES, V70
[2] Arora S, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3813
[3] Blanchard G., 2011, ADV NEURAL INFORM PR, V24, P2178
[4] Bucilua C., 2006, P 12 ACM SIGKDD INT, P535, DOI [DOI 10.1145/1150402.1150464, 10.1145/1150402.1150464]
[5] Open Set Domain Adaptation
Busto, Pau Panareda
Gall, Juergen
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 754 - 763
[6] Data-free Knowledge Distillation for Object Detection
Chawla, Akshay
Yin, Hongxu
Molchanov, Pavlo
Alvarez, Jose
[J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3288 - 3297
[7] Data-Free Learning of Student Networks
Chen, Hanting
Wang, Yunhe
Xu, Chang
Yang, Zhaohui
Liu, Chuanjian
Shi, Boxin
Xu, Chunjing
Xu, Chao
Tian, Qi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3513 - 3521
[8] Remote Sensing Image Scene Classification: Benchmark and State of the Art
Cheng, Gong
Han, Junwei
Lu, Xiaoqiang
[J]. PROCEEDINGS OF THE IEEE, 2017, 105 (10) : 1865 - 1883
[9] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[10] Fang G., 2019, ARXIV191211006, V1912, P11006

← 1 2 3 4 5 6 7 →