Synthesising 3D Facial Motion from "In-the-Wild" Speech

被引:6
作者
Tzirakis, Panagiotis [1 ]
Papaioannou, Athanasios [1 ]
Lattas, Alexandros [1 ]
Tarasiou, Michail [1 ]
Schuller, Bjoern [1 ,2 ]
Zafeiriou, Stefanos [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Augsburg, ZD B Chair Embedded Intelligence Hlth Care & Well, Augsburg, Germany
来源
2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020) | 2020年
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/FG47880.2020.00100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions ("in-the-wild") and independent of the speaker. For our purposes, we captured 4D sequences of people uttering 500 words, contained in the Lip Reading in the Wild (LRW) words, a publicly available large-scale in-the-wild dataset, and built a set of 3D blendshapes appropriate for speech. We correlate the 3D shape parameters of the speech blendshapes to the LRW audio samples by means of a novel time-warping technique, named Deep Canonical Attentional Warping (DCAW), that can simultaneously learn hierarchical non-linear representations and a warping path in an end-to-end manner. We thoroughly evaluate our proposed methods, and show the ability of a deep learning model to synthesise 3D facial motion in handling different speakers and continuous speech signals in uncontrolled conditions(1).
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
[41]   Probabilistic Speech-Driven 3D Facial Motion Synthesis: New Benchmarks, Methods, and Applications [J].
Yang, Karren D. ;
Ranjan, Anurag ;
Chang, Jen-Hao Rick ;
Vemulapalli, Raviteja ;
Tuzel, Oncel .
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, :27284-27293
[42]   Simulation of Turkish lip motion and facial expressions in a 3D environment and synchronization with a Turkish speech engine [J].
Akagündüz, E ;
Halici, U ;
Ulusoy, I .
PROCEEDINGS OF THE IEEE 12TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, 2004, :276-279
[43]   Exemplar Fine-Tuning for 3D Human Model Fitting Towards In-the-Wild 3D Human Pose Estimation [J].
Joo, Hanbyul ;
Neverova, Natalia ;
Vedaldi, Andrea .
2021 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2021), 2021, :42-52
[44]   Neural Emotion Director: Speech-preserving semantic control of facial expressions in "in-the-wild" videos [J].
Papantoniou, Foivos Paraperas ;
Filntisis, Panagiotis P. ;
Maragos, Petros ;
Roussos, Anastasios .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18759-18768
[45]   FabricDiffusion: High-Fidelity Texture Transfer for 3D Garments Generation from In-The-Wild Clothing Images [J].
Zhang, Cheng ;
Wang, Yuanhao ;
Carrasco, Francisco Vicente ;
Wu, Chenglei ;
Yang, Jinlong ;
Beeler, Thabo ;
De la Torre, Fernando .
PROCEEDINGS SIGGRAPH ASIA 2024 CONFERENCE PAPERS, 2024,
[46]   End-to-End 3D Face Reconstruction with Expressions and Specular Albedos from Single In-the-wild Images [J].
Deng, Qixin ;
Le, Binh H. ;
Jin, Aobo ;
Deng, Zhigang .
PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2022, 2022, :4694-4703
[47]   Weakly-Supervised Reconstruction of 3D Objects with Large Shape Variation from Single In-the-Wild Images [J].
Sun, Shichen ;
Zhu, Zhengbang ;
Dai, Xiaowei ;
Zhao, Qijun ;
Li, Jing .
COMPUTER VISION - ACCV 2020, PT I, 2021, 12622 :3-19
[48]   3D Visual passcode: Speech-driven 3D facial dynamics for behaviometrics [J].
Zhang, Jie ;
Fisher, Robert B. .
SIGNAL PROCESSING, 2019, 160 :164-177
[49]   Towards High-Fidelity 3D Face Reconstruction from In-the-Wild Images Using Graph Convolutional Networks [J].
Lin, Jiangke ;
Yuan, Yi ;
Shao, Tianjia ;
Zhou, Kun .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5890-5899
[50]   The 3D Tele Motion Tracking for the Orthodontic Facial Analysis [J].
Mummolo, Stefano ;
Nota, Alessandro ;
Marchetti, Enrico ;
Padricelli, Giuseppe ;
Marzo, Giuseppe .
BIOMED RESEARCH INTERNATIONAL, 2016, 2016