Implicit Memory-Based Variational Motion Talking Face Generation

被引：2

作者：

Yang, Daowu ^{[1
]}

Huang, Sheng ^{[1
]}

Jiang, Wen ^{[1
]}

Zou, Jin ^{[1
]}

机构：

[1] Hunan Int Econ Univ, Changsha 410205, Peoples R China

来源：

IEEE SIGNAL PROCESSING LETTERS | 2024年 / 31卷

关键词：

Implicit memory; speech-driven facial; audio-to-motion;

D O I：

10.1109/LSP.2024.3356415

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Speech-driven facial animation is a challenging problem where each input audio can have multiple plausible facial outputs, leading to overly smooth results. Although the two-stage framework of audio-to-motion model and neural rendering models can partially mitigate this issue, it lacks crucial details like emotions and wrinkles. To overcome these limitations, we introduce a variational motion generator with implicit memory. By incorporating implicit memory into the audio-to-motion model, we capture high-level semantics in the shared latent space of audio expressions, resulting in accurate and expressive facial landmark generation. Next, we introduce attention with time bias to effectively maintain the consistency of audio motion and adopt a periodic position encoding strategy to provide summarization capability for longer audio sequences. Experimental results demonstrate that our approach outperforms previous methods, yielding more extensive and realistic speech-driven facial animation.

引用

页码：431 / 435

页数：5

共 30 条

[21] MeshTalk: 3D Face Animation from Speech using Cross-Modality Disentanglement [J].

Richard, Alexander ;

Zollhoefer, Michael ;

Wen, Yandong ;

de la Torre, Fernando ;

Sheikh, Yaser .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :1153-1162

[22]

Tang J., 2022, arXiv

[23]

Vaswani A, 2017, ADV NEUR IN, V30

[24] One-Shot Free-View Neural Talking-Head Synthesis for Video Conferencing [J].

Wang, Ting-Chun ;

Mallya, Arun ;

Liu, Ming-Yu .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :10034-10044

[25] CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior [J].

Xing, Jinbo ;

Xia, Menghan ;

Zhang, Yuechen ;

Cun, Xiaodong ;

Wang, Jue ;

Wong, Tien-Tsin .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :12780-12790

[26]

Yang D., 2023, Soft Comput., V28, P363

[27]

Ye Z., 2023, arXiv

[28] Pose-Controllable Talking Face Generation by Implicitly Modularized Audio-Visual Representation [J].

Zhou, Hang ;

Sun, Yasheng ;

Wu, Wayne ;

Loy, Chen Change ;

Wang, Xiaogang ;

Liu, Ziwei .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4174-4184

[29]

Zhou H, 2019, AAAI CONF ARTIF INTE, P9299

[30] MakeltTalk: Speaker-Aware Talking-Head Animation [J].

Zhou, Yang ;

Han, Xintong ;

Shechtman, Eli ;

Echevarria, Jose ;

Kalogerakis, Evangelos ;

Li, Dingzeyu .

ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (06)

← 1 2 3 →