Neural Face Models for Example-Based Visual Speech Synthesis

被引:3
作者
Paier, Wolfgang
Hilsmann, Anna
Eisert, Peter
机构
来源
CVMP 2020: THE 17TH ACM SIGGRAPH EUROPEAN CONFERENCE ON VISUAL MEDIA PRODUCTION | 2020年
基金
欧盟地平线“2020”;
关键词
performance capture; hybrid face model; facial animation; visual speech synthesis; VIDEO;
D O I
10.1145/3429341.3429356
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Creating realistic animations of human faces with computer graphic models is still a challenging task. It is often solved either with tedious manual work or motion capture based techniques that require specialised and costly hardware. Example based animation approaches circumvent these problems by re-using captured data of real people. This data is split into short motion samples that can be looped or concatenated in order to create novel motion sequences. The obvious advantages of this approach are the simplicity of use and the high realism, since the data exhibits only real deformations. Rather than tuning weights of a complex face rig, the animation task is performed on a higher level by arranging typical motion samples in a way such that the desired facial performance is achieved. Two difficulties with example based approaches, however, are high memory requirements as well as the creation of artefact-free and realistic transitions between motion samples. We solve these problems by combining the realism and simplicity of example-based animations with the advantages of neural face models. Our neural face model is capable of synthesising high quality 3D face geometry and texture according to a compact latent parameter vector. This latent representation reduces memory requirements by a factor of 100 and helps creating seamless transitions between concatenated motion samples. In this paper, we present a marker-less approach for facial motion capture based on multi-view video. Based on the captured data, we learn a neural representation of facial expressions, which is used to seamlessly concatenate facial performances during the animation procedure. We demonstrate the effectiveness of our approach by synthesising mouthings for Swiss-German sign language based on viseme query sequences.
引用
收藏
页数:10
相关论文
共 46 条
[1]   Speaker-Independent Speech-Driven Visual Speech Synthesis using Domain-Adapted Acoustic Models [J].
Abdelaziz, Ahmed Hussen ;
Theobald, Barry-John ;
Binder, Justin ;
Fanelli, Gabriele ;
Dixon, Paul ;
Apostoloff, Nicholas ;
Weise, Thibaut ;
Kajareker, Sachin .
ICMI'19: PROCEEDINGS OF THE 2019 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2019, :220-225
[2]   A morphable model for the synthesis of 3D faces [J].
Blanz, V ;
Vetter, T .
SIGGRAPH 99 CONFERENCE PROCEEDINGS, 1999, :187-194
[3]  
Borshukov George, 2006, ACM SIGGRAPH 2006 SK
[4]   Fast approximate energy minimization via graph cuts [J].
Boykov, Y ;
Veksler, O ;
Zabih, R .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (11) :1222-1239
[5]  
Bregler C., 1997, Computer Graphics Proceedings, SIGGRAPH 97, P353, DOI 10.1145/258734.258880
[6]   FaceWarehouse: A 3D Facial Expression Database for Visual Computing [J].
Cao, Chen ;
Weng, Yanlin ;
Zhou, Shun ;
Tong, Yiying ;
Zhou, Kun .
IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (03) :413-425
[7]   Free-viewpoint video of human actors [J].
Carranza, J ;
Theobalt, C ;
Magnor, MA ;
Seidel, HP .
ACM TRANSACTIONS ON GRAPHICS, 2003, 22 (03) :569-577
[8]   4D video textures for interactive character appearance [J].
Casas, Dan ;
Volino, Marco ;
Collomosse, John ;
Hilton, Adrian .
COMPUTER GRAPHICS FORUM, 2014, 33 (02) :371-380
[9]   Active appearance models [J].
Cootes, TF ;
Edwards, GJ ;
Taylor, CJ .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (06) :681-685
[10]   Photo-Realistic Talking-Heads from Image Samples [J].
Cosatto, Eric ;
Graf, Hans Peter .
IEEE TRANSACTIONS ON MULTIMEDIA, 2000, 2 (03) :152-163