CKD: Cross-Task Knowledge Distillation for Text-to-Image Synthesis

被引：58

作者：

Yuan, Mingkuan ^{[1
]}

Peng, Yuxin ^{[1
]}

机构：

[1] Peking Univ, Wangxuan Inst Comp Technol, Beijing 100871, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Semantics; Visualization; Task analysis; Image synthesis; Generative adversarial networks; Neural networks; Image color analysis; Text-to-image synthesis; knowledge distillation; transfer learning; image semantic understanding; REPRESENTATION;

D O I：

10.1109/TMM.2019.2951463

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Text-to-image synthesis (T2IS) has drawn increasing interest recently, which can automatically generate images conditioned on text descriptions. It is a highly challenging task that learns a mapping from a semantic space of text description to a complex RGB pixel space of image. The main issues of T2IS lie in two aspects: semantic consistency and visual quality. The distributions between text descriptions and image contents are inconsistent since they belong to different modalities. So it is ambitious to generate images containing consistent semantic contents with the text descriptions, which is the semantic consistency issue. Moreover, due to the discrepancy of data distributions between real and synthetic images in huge pixel space, it is hard to approximate the real data distribution for synthesizing photo-realistic images, which is the visual quality issue. For addressing the above issues, we propose a cross-task knowledge distillation (CKD) approach to transfer knowledge from multiple image semantic understanding tasks into T2IS task. There is amount of knowledge in image semantic understanding tasks to translate image contents into semantic representation, which is advantageous to address the issues of semantic consistency and visual quality for T2IS. Moreover, we design a multi-stage knowledge distillation paradigm to decompose the distillation process into multiple stages. By this paradigm, it is effective to approximate the distributions of real image and understand textual information for T2IS, which can improve the visual quality and semantic consistency of synthetic images. Comprehensive experiments on widely-used datasets show the effectiveness of our proposed CKD approach.

引用

页码：1955 / 1968

页数：14

共 64 条

[1] [Anonymous], 2016, TECH REP
[2] Arjovsky M., 2017, INT C LEARNING REPRE
[3] Ba LJ., 2013, ADV NEURAL INFORM PR, V3, P2654, DOI DOI 10.5555/2969033.2969123
[4] Berthelot David, 2017, ARXIV170310717
[5] Buciluaˇ C., 2006, P 12 ACM SIGKDD INT, P535, DOI DOI 10.1145/1150402.1150464
[6] Fast k-means based on k-NN Graph
Deng, Cheng-Hao
Zhao, Wan-Lei
[J]. 2018 IEEE 34TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE), 2018, : 1220 - 1223
[7] Nonlocally Centralized Sparse Representation for Image Restoration
Dong, Weisheng
Zhang, Lei
Shi, Guangming
Li, Xin
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2013, 22 (04) : 1618 - 1628
[8] Inverting Visual Representations with Convolutional Networks
Dosovitskiy, Alexey
Brox, Thomas
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4829 - 4837
[9] Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[10] Gregor K., 2015, P INT C MACH LEARN, P1747

← 1 2 3 4 5 6 7 →