Sketch-Guided Latent Diffusion Model for High-Fidelity Face Image Synthesis

被引:4
作者
Peng, Yichen [1 ]
Zhao, Chunqi [2 ]
Xie, Haoran [1 ]
Fukusato, Tsukasa [3 ]
Miyata, Kazunori [1 ]
机构
[1] Japan Adv Inst Sci & Technol, Nomi, Ishikawa 9231292, Japan
[2] Univ Tokyo, Sch Creat Informat, Bunkyo, Tokyo 1138654, Japan
[3] Waseda Univ, Sch Fundamental Sci & Engn, Tokyo, Tokyo 1698555, Japan
关键词
Diffusion model; image synthesis; sketch-guided image generation;
D O I
10.1109/ACCESS.2023.3346408
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthesizing facial images from monochromatic sketches is one of the most fundamental tasks in the field of image-to-image translation. However, it is still challenging to teach model high-dimensional face features, such as geometry and color, and to the characteristics of input sketches, which should be considered simultaneously. Existing methods often use sketches as indirect inputs (or as auxiliary inputs) to guide models, resulting in the loss of sketch features or in alterations to geometry information. In this paper, we introduce a Sketch-Guided Latent Diffusion Model (SGLDM), an LDM-based network architecture trained on the paired sketch-face dataset. We apply a Multi-Auto-Encoder (AE) to encode the different input sketches from the various regions of a face from the pixel space into a feature map in the latent space, enabling us to reduce the dimensions of the sketch input while preserving the geometry-related information of the local face details. We build a sketch-face paired dataset based on an existing method XDoG and Sketch Simplification that extracts the edge map from an image. We then introduce a Stochastic Region Abstraction (SRA), an approach to augmenting our dataset to improve the robustness of the SGLDM to handle arbitrarily abstract sketch inputs. The evaluation study shows that the SGLDM can synthesize high-quality face images with different expressions, facial accessories, and hairstyles from various sketches having different abstraction levels, and the code and model have been released on the project page. https://puckikk1202.github.io/difffacesketch2023/
引用
收藏
页码:5770 / 5780
页数:11
相关论文
共 36 条
[1]  
Adobe Systems, 2022, Photo to Pencil Sketch
[2]   COCO-Stuff: Thing and Stuff Classes in Context [J].
Caesar, Holger ;
Uijlings, Jasper ;
Ferrari, Vittorio .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :1209-1218
[3]   DeepFaceDrawing: Deep Generation of Face Images from Sketches [J].
Chen, Shu-Yu ;
Su, Wanchao ;
Gao, Lin ;
Xia, Shihong ;
Fu, Hongbo .
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04)
[4]   ILVR: Conditioning Method for Denoising Diffusion Probabilistic Models [J].
Choi, Jooyoung ;
Kim, Sungwon ;
Jeong, Yonghyun ;
Gwon, Youngjune ;
Yoon, Sungroh .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :14347-14356
[5]   StarGAN v2: Diverse Image Synthesis for Multiple Domains [J].
Choi, Yunjey ;
Uh, Youngjung ;
Yoo, Jaejun ;
Ha, Jung-Woo .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8185-8194
[6]   SketchyCOCO: Image Generation from Freehand Scene Sketches [J].
Gao, Chengying ;
Liu, Qi ;
Xu, Qi ;
Wang, Limin ;
Liu, Jianzhuang ;
Zou, Changqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5173-5182
[7]   Masked Autoencoders Are Scalable Vision Learners [J].
He, Kaiming ;
Chen, Xinlei ;
Xie, Saining ;
Li, Yanghao ;
Dollar, Piotr ;
Girshick, Ross .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :15979-15988
[8]  
Ho Jonathan., 2020, P 34 INT C NEURAL IN, P6840
[9]  
Horita D, 2023, Arxiv, DOI arXiv:2211.10437
[10]   MaskGAN: Towards Diverse and Interactive Facial Image Manipulation [J].
Lee, Cheng-Han ;
Liu, Ziwei ;
Wu, Lingyun ;
Luo, Ping .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5548-5557