Synthesising 3D Facial Motion from "In-the-Wild" Speech

被引:6
作者
Tzirakis, Panagiotis [1 ]
Papaioannou, Athanasios [1 ]
Lattas, Alexandros [1 ]
Tarasiou, Michail [1 ]
Schuller, Bjoern [1 ,2 ]
Zafeiriou, Stefanos [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Augsburg, ZD B Chair Embedded Intelligence Hlth Care & Well, Augsburg, Germany
来源
2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020) | 2020年
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/FG47880.2020.00100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions ("in-the-wild") and independent of the speaker. For our purposes, we captured 4D sequences of people uttering 500 words, contained in the Lip Reading in the Wild (LRW) words, a publicly available large-scale in-the-wild dataset, and built a set of 3D blendshapes appropriate for speech. We correlate the 3D shape parameters of the speech blendshapes to the LRW audio samples by means of a novel time-warping technique, named Deep Canonical Attentional Warping (DCAW), that can simultaneously learn hierarchical non-linear representations and a warping path in an end-to-end manner. We thoroughly evaluate our proposed methods, and show the ability of a deep learning model to synthesise 3D facial motion in handling different speakers and continuous speech signals in uncontrolled conditions(1).
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
[31]   Domain Adaptive 3D Pose Augmentation for In-the-wild Human Mesh Recovery [J].
Weng, Zhenzhen ;
Wang, Kuan-Chieh ;
Kanazawa, Angjoo ;
Yeung, Serena .
2022 INTERNATIONAL CONFERENCE ON 3D VISION, 3DV, 2022, :261-270
[32]   3D rigid facial motion estimation from disparity maps [J].
de la Blanca, NP ;
Fuertes, JM ;
Lucena, M .
PROGRESS IN PATTERN RECOGNITION, SPEECH AND IMAGE ANALYSIS, 2003, 2905 :54-61
[33]   Estimating age and synthesising growth in children and adolescents using 3D facial prototypes [J].
Matthews, Harold ;
Penington, Anthony ;
Clement, John ;
Kilpatrick, Nicola ;
Fan, Yi ;
Claes, Peter .
FORENSIC SCIENCE INTERNATIONAL, 2018, 286 :61-69
[34]   "Wild West" of Evaluating Speech-Driven 3D Facial Animation Synthesis: A Benchmark Study [J].
Haque, Kazi Injamamul ;
Pavlou, Alkiviadis ;
Yumak, Zerrin .
COMPUTER GRAPHICS FORUM, 2025, 44 (02)
[35]   Facial 3D Shape Estimation from Images for Visual Speech Animation [J].
Musti, Utpala ;
Zhou, Ziheng ;
Pietikainen, Matti .
2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, :40-45
[36]   End-to-end Learning for 3D Facial Animation from Speech [J].
Pham, Hai X. ;
Wang, Yuting ;
Pavlovic, Vladimir .
ICMI'18: PROCEEDINGS OF THE 20TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2018, :361-365
[37]   Markerless 3D Facial Motion Capture System [J].
Hwang, Youngkyoo ;
Kim, Jung-Bae ;
Feng, Xuetao ;
Bang, Won-Chul ;
Rhee, Taehyun ;
Kim, James D. K. ;
Kim, ChangYeong .
ENGINEERING REALITY OF VIRTUAL REALITY 2012, 2012, 8289
[38]   PACE: Human and Camera Motion Estimation from in-the-wild Videos [J].
Kocabas, Muhammed ;
Yuan, Ye ;
Molchanov, Pavlo ;
Guo, Yunrong ;
Black, Michael J. ;
Hilliges, Otmar ;
Kautz, Jan ;
Iqbal, Umar .
2024 INTERNATIONAL CONFERENCE IN 3D VISION, 3DV 2024, 2024, :397-408
[39]   DreamCar: Leveraging Car-Specific Prior for In-the-Wild 3D Car Reconstruction [J].
Du, Xiaobiao ;
Sun, Haiyang ;
Lu, Ming ;
Zhu, Tianqing ;
Yu, Xin .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (02) :1840-1847
[40]   Real-Time 3D Face Fitting and Texture Fusion on In-the-Wild Videos [J].
Huber, Patrik ;
Kopp, Philipp ;
Christmas, William ;
Raetsch, Matthias ;
Kittler, Josef .
IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (04) :437-441