Contour wavelet diffusion - a fast and high-quality facial expression generation model

被引:0
作者
Xu, Chenwei [1 ]
Zou, Yuntao [2 ,3 ]
机构
[1] Commun Univ Zhejiang, Sch Design & Art, Hangzhou, Peoples R China
[2] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Wuhan, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Comp Sci & Technol, Luoyu Rd 1037, Wuhan 430074, Peoples R China
关键词
Facial Expression Generation; diffusion model; contour wavelet; TRANSFORM; DESIGN;
D O I
10.1080/09540091.2024.2316023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Facial expressions are important for conveying information in human interactions. The diffusion model can generate high-quality images for clearer and more discriminative faces, but its training and inference time is often prolonged, hampering practical application. Latent space diffusion models have shown promise in speeding up training by leveraging feature space parameters, but they require additional network structures. To address these limitations, we propose a contour wavelet diffusion model that accelerates both training and inference speeds. We use a contour wavelet transform to extract components from images and features, achieving substantial acceleration while preserving reconstruction quality. A normalised random channel attention module enhances the quality of generated images by focusing on high-frequency information. We also include a reconstruction loss function to enhance convergence speed. Experimental results demonstrate the effectiveness of our approach in boosting the training and inference speeds of diffusion models without sacrificing image quality. Fast generation of facial expressions can provide a smoother and more natural user experience, which is important for real-time applications. In addition, the increase in inference speed can save the use of computational resources, reduce system cost and improve energy efficiency, which is conducive to promoting the development and application of this technology.
引用
收藏
页数:20
相关论文
共 54 条
  • [1] Image2StyleGAN: How to Embed Images Into the StyleGAN Latent Space?
    Abdal, Rameen
    Qin, Yipeng
    Wonka, Peter
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4431 - 4440
  • [2] A FILTER BANK FOR THE DIRECTIONAL DECOMPOSITION OF IMAGES - THEORY AND DESIGN
    BAMBERGER, RH
    SMITH, MJT
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1992, 40 (04) : 882 - 893
  • [3] Generative Adversarial Networks in Time Series: A Systematic Literature Review
    Brophy, Eoin
    Wang, Zhengwei
    She, Qi
    Ward, Tomas
    [J]. ACM COMPUTING SURVEYS, 2023, 55 (10)
  • [4] Bubeck S., 2023, arXiv
  • [5] Conditional mutagenesis using site-specific recombination in Plasmodium berghei
    Carvalho, TG
    Thiberge, S
    Sakamoto, H
    Ménard, R
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (41) : 14931 - 14936
  • [6] Diffusion Models in Vision: A Survey
    Croitoru, Florinel-Alin
    Hondru, Vlad
    Ionescu, Radu Tudor
    Shah, Mubarak
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (09) : 10850 - 10869
  • [7] The nonsubsampled contourlet transform: Theory, design, and applications
    da Cunha, Arthur L.
    Zhou, Jianping
    Do, Minh N.
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (10) : 3089 - 3101
  • [8] Dhariwal P, 2021, ADV NEUR IN, V34
  • [9] DeepCoder: Semi-parametric Variational Autoencoders for Automatic Facial Action Coding
    Dieu Linh Tran
    Walecki, Robert
    Rudovic, Ognjen
    Eleftheriadis, Stefanos
    Schuller, Bjorn
    Pantic, Maja
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 3209 - 3218
  • [10] The finite ridgelet transform for image representation
    Do, MN
    Vetterli, M
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2003, 12 (01) : 16 - 28