Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data

被引:0
作者
Fang, Gongfan [1 ,4 ]
Bao, Yifan [1 ]
Song, Jie [1 ]
Wang, Xinchao [2 ]
Xie, Donglin [1 ]
Shen, Chengchao [3 ]
Song, Mingli [1 ]
机构
[1] Zhejiang Univ, Hangzhou, Peoples R China
[2] Natl Univ Singapore, Singapore, Singapore
[3] Cent South Univ, Changsha, Peoples R China
[4] Alibaba Zhejiang Univ Joint Inst Frontier Technol, Hangzhou, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Knowledge distillation (KD) aims to craft a compact student model that imitates the behavior of a pre-trained teacher in a target domain. Prior KD approaches, despite their gratifying results, have largely relied on the premise that in-domain data is available to carry out the knowledge transfer. Such an assumption, unfortunately, in many cases violates the practical setting, since the original training data or even the data domain is often unreachable due to privacy or copyright reasons. In this paper, we attempt to tackle an ambitious task, termed as out-of-domain knowledge distillation (OOD-KD), which allows us to conduct KD using only OOD data that can be readily obtained at a very low cost. Admittedly, OOD-KD is by nature a highly challenging task due to the agnostic domain gap. To this end, we introduce a handy yet surprisingly efficacious approach, dubbed as MosaicKD. The key insight behind MosaicKD lies in that, samples from various domains share common local patterns, even though their global semantic may vary significantly; these shared local patterns, in turn, can be re-assembled analogous to mosaic tiling, to approximate the in-domain data and to further alleviating the domain discrepancy. In MosaicKD, this is achieved through a four-player min-max game, in which a generator, a discriminator, a student network, are collectively trained in an adversarial manner, partially under the guidance of a pre-trained teacher. We validate MosaicKD over classification and semantic segmentation tasks across various benchmarks, and demonstrate that it yields results much superior to the state-of-the-art counterparts on OOD data. Our code is available at https://github.com/zju-vipa/MosaicKD.
引用
收藏
页数:13
相关论文
共 64 条
  • [1] Arjovsky M, 2017, PR MACH LEARN RES, V70
  • [2] Arora S, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3813
  • [3] Blanchard G., 2011, ADV NEURAL INFORM PR, V24, P2178
  • [4] Bucilua C., 2006, P 12 ACM SIGKDD INT, P535, DOI [DOI 10.1145/1150402.1150464, 10.1145/1150402.1150464]
  • [5] Open Set Domain Adaptation
    Busto, Pau Panareda
    Gall, Juergen
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 754 - 763
  • [6] Data-free Knowledge Distillation for Object Detection
    Chawla, Akshay
    Yin, Hongxu
    Molchanov, Pavlo
    Alvarez, Jose
    [J]. 2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, : 3288 - 3297
  • [7] Data-Free Learning of Student Networks
    Chen, Hanting
    Wang, Yunhe
    Xu, Chang
    Yang, Zhaohui
    Liu, Chuanjian
    Shi, Boxin
    Xu, Chunjing
    Xu, Chao
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 3513 - 3521
  • [8] Remote Sensing Image Scene Classification: Benchmark and State of the Art
    Cheng, Gong
    Han, Junwei
    Lu, Xiaoqiang
    [J]. PROCEEDINGS OF THE IEEE, 2017, 105 (10) : 1865 - 1883
  • [9] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [10] Fang G., 2019, ARXIV191211006, V1912, P11006