Contour wavelet diffusion: A fast and high-quality image generation model

被引:0
作者
Ding, Yaoyao [1 ,2 ]
Zhu, Xiaoxi [3 ]
Zou, Yuntao [4 ]
机构
[1] Nanjing Univ Arts, Purple Acad Culture & Creat, Nanjing, Jiangsu, Peoples R China
[2] Macau Univ Sci & Technol, Fac Humanities & Arts, Macau, Macao, Peoples R China
[3] Jiangsu Univ, Coll Art, Zhenjiang, Jiangsu, Peoples R China
[4] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
关键词
contour wavelet; diffusion; image generation; latent space; TRANSFORM; DESIGN;
D O I
10.1111/coin.12644
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Diffusion models can generate high-quality images and have attracted increasing attention. However, diffusion models adopt a progressive optimization process and often have long training and inference time, which limits their application in realistic scenarios. Recently, some latent space diffusion models have partially accelerated training speed by using parameters in the feature space, but additional network structures still require a large amount of unnecessary computation. Therefore, we propose the Contour Wavelet Diffusion method to accelerate the training and inference speed. First, we introduce the contour wavelet transform to extract anisotropic low-frequency and high-frequency components from the input image, and achieve acceleration by processing these down-sampling components. Meanwhile, due to the good reconstructive properties of wavelet transforms, the quality of generated images can be maintained. Second, we propose a Batch-normalized stochastic attention module that enables the model to effectively focus on important high-frequency information, further improving the quality of image generation. Finally, we propose a balanced loss function to further improve the convergence speed of the model. Experimental results on several public datasets show that our method can significantly accelerate the training and inference speed of the diffusion model while ensuring the quality of generated images.
引用
收藏
页数:19
相关论文
共 40 条
[1]   Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space? [J].
Abdal, Rameen ;
Qin, Yipeng ;
Wonka, Peter .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4431-4440
[2]   Novel view synthesis in tensor space [J].
Avidan, S ;
Shashua, A .
1997 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, 1997, :1034-1040
[3]   A FILTER BANK FOR THE DIRECTIONAL DECOMPOSITION OF IMAGES - THEORY AND DESIGN [J].
BAMBERGER, RH ;
SMITH, MJT .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1992, 40 (04) :882-893
[4]   Generative Adversarial Networks in Time Series: A Systematic Literature Review [J].
Brophy, Eoin ;
Wang, Zhengwei ;
She, Qi ;
Ward, Tomas .
ACM COMPUTING SURVEYS, 2023, 55 (10)
[5]  
Bubeck Sebastien., 2023, Sparks of artificial general intelligence: Early experiments with gpt-4
[6]  
Chang T-Y., 2020, Tinygan: distilling biggan for conditional image generation. In: Proceedings of the Asian Conference on Computer Vision
[7]   Diffusion Models in Vision: A Survey [J].
Croitoru, Florinel-Alin ;
Hondru, Vlad ;
Ionescu, Radu Tudor ;
Shah, Mubarak .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) :10850-10869
[8]   The nonsubsampled contourlet transform: Theory, design, and applications [J].
da Cunha, Arthur L. ;
Zhou, Jianping ;
Do, Minh N. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) :3089-3101
[9]   The finite ridgelet transform for image representation [J].
Do, MN ;
Vetterli, M .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2003, 12 (01) :16-28
[10]   SA-GAN: Structure-Aware GAN for Organ-Preserving Synthetic CT Generation [J].
Emami, Hajar ;
Dong, Ming ;
Nejad-Davarani, Siamak P. ;
Glide-Hurst, Carri K. .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2021, PT VI, 2021, 12906 :471-481