INVARIANT MOTION REPRESENTATION LEARNING FOR 3D TALKING FACE SYNTHESIS

被引：0

作者：

Liu, Jiyuan ^{[1
]}

Wei, Wenping ^{[1
]}

Li, Zhendong ^{[1
,2
]}

Li, Guanfeng ^{[1
,2
]}

Liu, Hao ^{[1
,2
]}

机构：

[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China

[2] Ningxia Key Lab Artificial Intelligence & Informa, Yinchuan, Peoples R China

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年

基金：

美国国家科学基金会;

关键词：

Talking face synthesis; NeRF models;

D O I：

10.1109/ICASSP48485.2024.10446379

中图分类号：

学科分类号：

摘要：

In this paper, we propose the invariant motion representation learning method for deformable talking face synthesis. Conventional NeRF-based methods learn to match the audio-motion without considering motion consistency information, leading to blurry results, especially when face sequences were captured in wild conditions. To address this limitation, our model aims to explore the audio-motion invariance directly from the video clips and exploits the facial movements based on any given piece of speech. Specifically, we develop the motion invariance and audio-motion contrastive learning modules and then produce facial motion to probe facial landmarks into intra-person identity and intra-motion classes. Thus, our proposed cycle-loop paradigm achieves to reinforce lip synchronization and inter-frame consistency. Experimental results show the effectiveness of our method.

引用

页码：4700 / 4704

页数：5

共 50 条

[1] THE INVARIANT NATURE OF 3D REPRESENTATION FROM MOTION
PAVEL, M
WEINSHALL, D
INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 1994, 35 (04) : 1277 - 1277
[2] An Image Representation for the 3D Face Synthesis
Luo, Guoliang
Zeng, Wei
Xie, Wenqiang
Lei, Haopeng
Xian, Chuhua
PROCEEDINGS OF THE 31ST INTERNATIONAL CONFERENCE ON COMPUTER ANIMATION AND SOCIAL AGENTS (CASA 2016), 2015, : 27 - 31
[3] Disentangled Representation Learning for 3D Face Shape
Jiang, Zi-Hang
Wu, Qianyi
Chen, Keyu
Zhang, Juyong
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11949 - 11958
[4] 3D Face Representation Using Scale and Transform Invariant Features
Akaguenduez, Erdem
Ulusoy, Ilkay
2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 14 - 17
[5] Geometric Invariant Representation Learning for 3D Point Cloud
Li, Zongmin
Zhang, Yupeng
Bai, Yun
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1480 - 1485
[6] Learning 3D Face Representation with Vision Transformer for Masked Face Recognition
Wang, Yuan
Yang, Zhen
Zhang, Zhiqiang
Zang, Huaijuan
Zhu, Qiang
Zhan, Shu
2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 505 - 511
[7] Face feature detection for 3D model of talking head with speech synthesis
Talafova, R.
Rozinaj, G.
2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 339 - +
[8] Motion recognition and synthesis based on 3D sparse representation
Xiang, Jian
Liang, Ronghua
SIGNAL PROCESSING, 2015, 110 : 82 - 93
[9] Learning Distribution Independent Latent Representation for 3D Face Disentanglement
Zhang, Zihui
Yu, Cuican
Li, Huibin
Sun, Jian
Liu, Feng
2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 848 - 857
[10] Learning Robust 3D Face Reconstruction and Discriminative Identity Representation
Luo, Yao
Tu, Xiaoguang
Xie, Mei
2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 317 - 321

← 1 2 3 4 5 →