Synthesising 3D Facial Motion from "In-the-Wild" Speech

被引:6
作者
Tzirakis, Panagiotis [1 ]
Papaioannou, Athanasios [1 ]
Lattas, Alexandros [1 ]
Tarasiou, Michail [1 ]
Schuller, Bjoern [1 ,2 ]
Zafeiriou, Stefanos [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Augsburg, ZD B Chair Embedded Intelligence Hlth Care & Well, Augsburg, Germany
来源
2020 15TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE AND GESTURE RECOGNITION (FG 2020) | 2020年
基金
英国工程与自然科学研究理事会;
关键词
D O I
10.1109/FG47880.2020.00100
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Synthesising 3D facial motion from speech is a crucial problem manifesting in a multitude of applications such as computer games and movies. Recently proposed methods tackle this problem in controlled conditions of speech. In this paper, we introduce the first methodology for 3D facial motion synthesis from speech captured in arbitrary recording conditions ("in-the-wild") and independent of the speaker. For our purposes, we captured 4D sequences of people uttering 500 words, contained in the Lip Reading in the Wild (LRW) words, a publicly available large-scale in-the-wild dataset, and built a set of 3D blendshapes appropriate for speech. We correlate the 3D shape parameters of the speech blendshapes to the LRW audio samples by means of a novel time-warping technique, named Deep Canonical Attentional Warping (DCAW), that can simultaneously learn hierarchical non-linear representations and a warping path in an end-to-end manner. We thoroughly evaluate our proposed methods, and show the ability of a deep learning model to synthesise 3D facial motion in handling different speakers and continuous speech signals in uncontrolled conditions(1).
引用
收藏
页码:265 / 272
页数:8
相关论文
共 50 条
[21]   Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes [J].
Choi, Hongsuk ;
Moon, Gyeongsik ;
Park, JoonKyu ;
Lee, Kyoung Mu .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1465-1474
[22]   Synthesising 2D Video from 3D Motion Data for Machine Learning Applications [J].
Mundt, Marion ;
Oberlack, Henrike ;
Goldacre, Molly ;
Powles, Julia ;
Funken, Johannes ;
Morris, Corey ;
Potthast, Wolfgang ;
Alderson, Jacqueline .
SENSORS, 2022, 22 (17)
[23]   KMTalk: Speech-Driven 3D Facial Animation with Key Motion Embedding [J].
Xu, Zhihao ;
Gong, Shengjie ;
Tang, Jiapeng ;
Liang, Lingyu ;
Huang, Yining ;
Li, Haojie ;
Huang, Shuangping .
COMPUTER VISION - ECCV 2024, PT LVI, 2025, 15114 :236-253
[24]   CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [J].
Xing, Jinbo ;
Xia, Menghan ;
Zhang, Yuechen ;
Cun, Xiaodong ;
Wang, Jue ;
Wong, Tien-Tsin .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :12780-12790
[25]   Learning to regulate 3D head shape by removing occluding hair from in-the-wild images [J].
Anisetty, Sohan ;
Saravanabavan, Varsha ;
Cai Yiyu .
2022 IEEE INTERNATIONAL SYMPOSIUM ON MIXED AND AUGMENTED REALITY ADJUNCT (ISMAR-ADJUNCT 2022), 2022, :403-408
[26]   Generating Holistic 3D Human Motion from Speech [J].
Yi, Hongwei ;
Liang, Hualin ;
Liu, Yifei ;
Cao, Qiong ;
Wen, Yandong ;
Bolkart, Timo ;
Tao, Dacheng ;
Black, Michael J. .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :469-480
[27]   Learning Free-Form Deformation for 3D Face Reconstruction from In-The-Wild Images [J].
Jung, Harim ;
Oh, Myeong-Seok ;
Lee, Seong-Whan .
2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, :2737-2742
[28]   Speech4Mesh: Speech-Assisted Monocular 3D Facial Reconstruction for Speech-Driven 3D Facial Animation [J].
He, Shan ;
He, Haonan ;
Yang, Shuo ;
Wu, Xiaoyan ;
Xia, Pengcheng ;
Yin, Bing ;
Liu, Cong ;
Dai, Lirong ;
Xu, Chang .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, :14146-14156
[29]   CADSim: Robust and Scalable in-the-wild 3D Reconstruction for Controllable Sensor Simulation [J].
Wang, Jingkang ;
Manivasagam, Sivabalan ;
Chen, Yun ;
Yang, Ze ;
Barsan, Ioan Andrei ;
Yang, Anqi Joyce ;
Ma, Wei-Chiu ;
Urtasun, Raquel .
CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 :630-642
[30]   In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces [J].
Xiong, Jinhui ;
Heidrich, Wolfgang .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12538-12547