Contour wavelet diffusion - a fast and high-quality facial expression generation model

被引：0

作者：

Xu, Chenwei ^{[1
]}

Zou, Yuntao ^{[2
,3
]}

机构：

[1] Commun Univ Zhejiang, Sch Design & Art, Hangzhou, Peoples R China

[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China

[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Luoyu Rd 1037, Wuhan 430074, Peoples R China

来源：

CONNECTION SCIENCE | 2024年 / 36卷 / 01期

关键词：

Facial Expression Generation; diffusion model; contour wavelet; TRANSFORM; DESIGN;

D O I：

10.1080/09540091.2024.2316023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Facial expressions are important for conveying information in human interactions. The diffusion model can generate high-quality images for clearer and more discriminative faces, but its training and inference time is often prolonged, hampering practical application. Latent space diffusion models have shown promise in speeding up training by leveraging feature space parameters, but they require additional network structures. To address these limitations, we propose a contour wavelet diffusion model that accelerates both training and inference speeds. We use a contour wavelet transform to extract components from images and features, achieving substantial acceleration while preserving reconstruction quality. A normalised random channel attention module enhances the quality of generated images by focusing on high-frequency information. We also include a reconstruction loss function to enhance convergence speed. Experimental results demonstrate the effectiveness of our approach in boosting the training and inference speeds of diffusion models without sacrificing image quality. Fast generation of facial expressions can provide a smoother and more natural user experience, which is important for real-time applications. In addition, the increase in inference speed can save the use of computational resources, reduce system cost and improve energy efficiency, which is conducive to promoting the development and application of this technology.

引用

页数：20

共 54 条

[1] Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
Abdal, Rameen
Qin, Yipeng
Wonka, Peter
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4431 - 4440
[2] A FILTER BANK FOR THE DIRECTIONAL DECOMPOSITION OF IMAGES - THEORY AND DESIGN
BAMBERGER, RH
SMITH, MJT
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1992, 40 (04) : 882 - 893
[3] Generative Adversarial Networks in Time Series: A Systematic Literature Review
Brophy, Eoin
Wang, Zhengwei
She, Qi
Ward, Tomas
[J]. ACM COMPUTING SURVEYS, 2023, 55 (10)
[4] Bubeck S., 2023, arXiv
[5] Conditional mutagenesis using site-specific recombination in Plasmodium berghei
Carvalho, TG
Thiberge, S
Sakamoto, H
Ménard, R
[J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (41) : 14931 - 14936
[6] Diffusion Models in Vision: A Survey
Croitoru, Florinel-Alin
Hondru, Vlad
Ionescu, Radu Tudor
Shah, Mubarak
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 10850 - 10869
[7] The nonsubsampled contourlet transform: Theory, design, and applications
da Cunha, Arthur L.
Zhou, Jianping
Do, Minh N.
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) : 3089 - 3101
[8] Dhariwal P, 2021, ADV NEUR IN, V34
[9] DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding
Dieu Linh Tran
Walecki, Robert
Rudovic, Ognjen
Eleftheriadis, Stefanos
Schuller, Bjorn
Pantic, Maja
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3209 - 3218
[10] The finite ridgelet transform for image representation
Do, MN
Vetterli, M
[J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2003, 12 (01) : 16 - 28

← 1 2 3 4 5 6 →