Unsupervised learning of style-aware facial animation from real acting performances

被引:6
作者
Paier, Wolfgang [1 ]
Hilsmann, Anna [1 ]
Eisert, Peter [1 ,2 ]
机构
[1] Fraunhofer Heinrich Hertz Inst, Berlin, Germany
[2] Humboldt Univ, Berlin, Germany
基金
欧盟地平线“2020”;
关键词
Facial animation; Neural rendering; Neural animation; Self-supervised learning; Dynamic textures; VIDEO; MODEL;
D O I
10.1016/j.gmod.2023.101199
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
This paper presents a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering. Training a VAE for geometry and texture yields a parametric model for accurate capturing and realistic synthesis of facial expressions from latent feature vector. Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters. In contrast to previous approaches, our animation model learns disentangling/synthesizing different acting-styles in an unsupervised manner, requiring only phonetic labels that describe the content of training sequences. For realistic real-time rendering, we train a U-Net that refines rasterization-based renderings by computing improved pixel colors and a foreground matte. We compare our framework qualitatively/quantitatively against recent methods for head modeling as well as facial animation and evaluate the perceived rendering/animation quality in a user-study, which indicates large improvements compared to state-of-the-art approaches.
引用
收藏
页数:13
相关论文
共 86 条
[81]   PlenOctrees for Real-time Rendering of Neural Radiance Fields [J].
Yu, Alex ;
Li, Ruilong ;
Tancik, Matthew ;
Li, Hao ;
Ng, Ren ;
Kanazawa, Angjoo .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :5732-5741
[82]  
Zhang C., 2021, P IEEECVF INT C COMP, P3867
[83]   The Unreasonable Effectiveness of Deep Features as a Perceptual Metric [J].
Zhang, Richard ;
Isola, Phillip ;
Efros, Alexei A. ;
Shechtman, Eli ;
Wang, Oliver .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :586-595
[84]   Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [J].
Zhou, Hang ;
Sun, Yasheng ;
Wu, Wayne ;
Loy, Chen Change ;
Wang, Xiaogang ;
Liu, Ziwei .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4174-4184
[85]   MakeltTalk: Speaker-Aware Talking-Head Animation [J].
Zhou, Yang ;
Han, Xintong ;
Shechtman, Eli ;
Echevarria, Jose ;
Kalogerakis, Evangelos ;
Li, Dingzeyu .
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06)
[86]   VisemeNet: Audio-Driven Animator-Centric Speech Animation [J].
Zhou, Yang ;
Xu, Zhan ;
Landreth, Chris ;
Kalogerakis, Evangelos ;
Maji, Subhransu ;
Singh, Karan .
ACM TRANSACTIONS ON GRAPHICS, 2018, 37 (04)