Unsupervised learning of style-aware facial animation from real acting performances

被引：6

作者：

Paier, Wolfgang ^{[1
]}

Hilsmann, Anna ^{[1
]}

Eisert, Peter ^{[1
,2
]}

机构：

[1] Fraunhofer Heinrich Hertz Inst, Berlin, Germany

[2] Humboldt Univ, Berlin, Germany

来源：

GRAPHICAL MODELS | 2023年 / 129卷

基金：

欧盟地平线“2020”;

关键词：

Facial animation; Neural rendering; Neural animation; Self-supervised learning; Dynamic textures; VIDEO; MODEL;

D O I：

10.1016/j.gmod.2023.101199

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

This paper presents a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering. Training a VAE for geometry and texture yields a parametric model for accurate capturing and realistic synthesis of facial expressions from latent feature vector. Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters. In contrast to previous approaches, our animation model learns disentangling/synthesizing different acting-styles in an unsupervised manner, requiring only phonetic labels that describe the content of training sequences. For realistic real-time rendering, we train a U-Net that refines rasterization-based renderings by computing improved pixel colors and a foreground matte. We compare our framework qualitatively/quantitatively against recent methods for head modeling as well as facial animation and evaluate the perceived rendering/animation quality in a user-study, which indicates large improvements compared to state-of-the-art approaches.

引用

页数：13

共 86 条

[61] U-Net: Convolutional Networks for Biomedical Image Segmentation [J].

Ronneberger, Olaf ;

Fischer, Philipp ;

Brox, Thomas .

MEDICAL IMAGE COMPUTING AND COMPUTER-ASSISTED INTERVENTION, PT III, 2015, 9351 :234-241

[62]

Siarohin A, 2019, ADV NEUR IN, V32

[63]

Sitzmann V, 2019, ADV NEUR IN, V32

[64]

Slossberg R, 2018, Arxiv, DOI arXiv:1808.08281

[65]

Sosci survey, 2023, About us

[66] Synthesizing Obama: Learning Lip Sync from Audio [J].

Suwajanakorn, Supasorn ;

Seitz, Steven M. ;

Kemelmacher-Shlizerman, Ira .

ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (04)

[67] State of the Art on Neural Rendering [J].

Tewari, A. ;

Fried, O. ;

Thies, J. ;

Sitzmann, V. ;

Lombardi, S. ;

Sunkavalli, K. ;

Martin-Brualla, R. ;

Simon, T. ;

Saragih, J. ;

Niessner, M. ;

Pandey, R. ;

Fanello, S. ;

Wetzstein, G. ;

Zhu, J. Y. ;

Theobalt, C. ;

Agrawala, M. ;

Shechtman, E. ;

Goldman, D. B. ;

Zollhofer, M. .

COMPUTER GRAPHICS FORUM, 2020, 39 (02) :701-727

[68] FML: Face Model Learning from Videos [J].

Tewari, Ayush ;

Bernard, Florian ;

Garrido, Pablo ;

Bharaj, Gaurav ;

Elgharib, Mohamed ;

Seidel, Hans-Peter ;

Perez, Patrick ;

Zollhofer, Michael ;

Theobalt, Christian .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :10804-10814

[69] High-Fidelity Monocular Face Reconstruction Based on an Unsupervised Model-Based Face Autoencoder [J].

Tewari, Ayush ;

Zollhofer, Michael ;

Bernard, Florian ;

Garrido, Pablo ;

Kim, Hyeongwoo ;

Perez, Patrick ;

Theobalt, Christian .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (02) :357-370

[70] MoFA: Model-based Deep Convolutional Face Autoencoder for Unsupervised Monocular Reconstruction [J].

Tewari, Ayush ;

Zollhofer, Michael ;

Kim, Hyeongwoo ;

Garrido, Pablo ;

Bernard, Florian ;

Perez, Patrick ;

Theobalt, Christian .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW 2017), 2017, :1274-1283

← 1 2 3 4 5 6 7 8 9 →