Diffusion Probabilistic Model Made Slim

被引：54

作者：

Yang, Xingyi ^{[1
]}

Zhou, Daquan ^{[2
]}

Feng, Jiashi ^{[2
]}

Wang, Xinchao ^{[1
]}

机构：

[1] Natl Univ Singapore, Singapore, Singapore

[2] ByteDance Inc, Culver City, CA USA

来源：

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2023年

关键词：

NATURAL IMAGES; STATISTICS; SPECTRA; GAN;

D O I：

10.1109/CVPR52729.2023.02160

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the recent visually-pleasing results achieved, the massive computational cost has been a long-standing flaw for diffusion probabilistic models (DPMs), which, in turn, greatly limits their applications on resource-limited platforms. Prior methods towards efficient DPM, however, have largely focused on accelerating the testing yet overlooked their huge complexity and sizes. In this paper, we make a dedicated attempt to lighten DPM while striving to preserve its favourable performance. We start by training a small-sized latent diffusion model (LDM) from scratch, but observe a significant fidelity drop in the synthetic images. Through a thorough assessment, we find that DPM is intrinsically biased against high-frequency generation, and learns to recover different frequency components at different time-steps. These properties make compact networks unable to represent frequency dynamics with accurate high-frequency estimation. Towards this end, we introduce a customized design for slim DPM, which we term as Spectral Diffusion (SD), for light-weight image synthesis. SD incorporates wavelet gating in its architecture to enable frequency dynamic feature extraction at every reverse step, and conducts spectrum-aware distillation to promote high-frequency recovery by inverse weighting the objective based on spectrum magnitude. Experimental results demonstrate that, SD achieves 8-18x computational complexity reduction as compared to the latent diffusion models on a series of conditional and unconditional image generation tasks while retaining competitive image fidelity.

引用

页码：22552 / 22562

页数：11

共 78 条

[1]

[Anonymous], 2021, ADV NEURAL INFORM PR, DOI DOI 10.1080/20477724.2021.1951556

[2]

Bao Fan, 2022, ARXIV220106503

[3]

Basri R, 2020, PR MACH LEARN RES, V119

[4] COLOR AND SPATIAL STRUCTURE IN NATURAL SCENES [J].

BURTON, GJ ;

MOORHEAD, IR .

APPLIED OPTICS, 1987, 26 (01) :157-170

[5]

Cao KD, 2019, ADV NEUR IN, V32

[6]

Chen YQ, 2021, AAAI CONF ARTIF INTE, V35, P1105

[7] Perception Prioritized Training of Diffusion Models [J].

Choi, Jooyoung ;

Lee, Jungbeom ;

Shin, Chaehun ;

Kim, Sungwon ;

Kim, Hyunwoo ;

Yoon, Sungroh .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, :11462-11471

[8]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[9]

Fang Gongfan, 2023, IEEE CVF C COMP VIS

[10] RELATIONS BETWEEN THE STATISTICS OF NATURAL IMAGES AND THE RESPONSE PROPERTIES OF CORTICAL-CELLS [J].

FIELD, DJ .

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 1987, 4 (12) :2379-2394

← 1 2 3 4 5 6 7 8 →