Diffusion Probabilistic Model Made Slim

被引:54
作者
Yang, Xingyi [1 ]
Zhou, Daquan [2 ]
Feng, Jiashi [2 ]
Wang, Xinchao [1 ]
机构
[1] Natl Univ Singapore, Singapore, Singapore
[2] ByteDance Inc, Culver City, CA USA
来源
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年
关键词
NATURAL IMAGES; STATISTICS; SPECTRA; GAN;
D O I
10.1109/CVPR52729.2023.02160
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and sizes. In this paper, we make a dedicated attempt to lighten DPM while striving to preserve its favourable performance. We start by training a small-sized latent diffusion model (LDM) from scratch, but observe a significant fidelity drop in the synthetic images. Through a thorough assessment, we find that DPM is intrinsically biased against high-frequency generation, and learns to recover different frequency components at different time-steps. These properties make compact networks unable to represent frequency dynamics with accurate high-frequency estimation. Towards this end, we introduce a customized design for slim DPM, which we term as Spectral Diffusion (SD), for light-weight image synthesis. SD incorporates wavelet gating in its architecture to enable frequency dynamic feature extraction at every reverse step, and conducts spectrum-aware distillation to promote high-frequency recovery by inverse weighting the objective based on spectrum magnitude. Experimental results demonstrate that, SD achieves 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks while retaining competitive image fidelity.
引用
收藏
页码:22552 / 22562
页数:11
相关论文
共 78 条
[1]  
[Anonymous], 2021, ADV NEURAL INFORM PR, DOI DOI 10.1080/20477724.2021.1951556
[2]  
Bao Fan, 2022, ARXIV220106503
[3]  
Basri R, 2020, PR MACH LEARN RES, V119
[4]   COLOR AND SPATIAL STRUCTURE IN NATURAL SCENES [J].
BURTON, GJ ;
MOORHEAD, IR .
APPLIED OPTICS, 1987, 26 (01) :157-170
[5]  
Cao KD, 2019, ADV NEUR IN, V32
[6]  
Chen YQ, 2021, AAAI CONF ARTIF INTE, V35, P1105
[7]   Perception Prioritized Training of Diffusion Models [J].
Choi, Jooyoung ;
Lee, Jungbeom ;
Shin, Chaehun ;
Kim, Sungwon ;
Kim, Hyunwoo ;
Yoon, Sungroh .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11462-11471
[8]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[9]  
Fang Gongfan, 2023, IEEE CVF C COMP VIS
[10]   RELATIONS BETWEEN THE STATISTICS OF NATURAL IMAGES AND THE RESPONSE PROPERTIES OF CORTICAL-CELLS [J].
FIELD, DJ .
JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1987, 4 (12) :2379-2394