INVARIANT MOTION REPRESENTATION LEARNING FOR 3D TALKING FACE SYNTHESIS

被引:0
|
作者
Liu, Jiyuan [1 ]
Wei, Wenping [1 ]
Li, Zhendong [1 ,2 ]
Li, Guanfeng [1 ,2 ]
Liu, Hao [1 ,2 ]
机构
[1] Ningxia Univ, Sch Informat Engn, Yinchuan, Peoples R China
[2] Ningxia Key Lab Artificial Intelligence & Informa, Yinchuan, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
基金
美国国家科学基金会;
关键词
Talking face synthesis; NeRF models;
D O I
10.1109/ICASSP48485.2024.10446379
中图分类号
学科分类号
摘要
In this paper, we propose the invariant motion representation learning method for deformable talking face synthesis. Conventional NeRF-based methods learn to match the audio-motion without considering motion consistency information, leading to blurry results, especially when face sequences were captured in wild conditions. To address this limitation, our model aims to explore the audio-motion invariance directly from the video clips and exploits the facial movements based on any given piece of speech. Specifically, we develop the motion invariance and audio-motion contrastive learning modules and then produce facial motion to probe facial landmarks into intra-person identity and intra-motion classes. Thus, our proposed cycle-loop paradigm achieves to reinforce lip synchronization and inter-frame consistency. Experimental results show the effectiveness of our method.
引用
收藏
页码:4700 / 4704
页数:5
相关论文
共 50 条
  • [1] THE INVARIANT NATURE OF 3D REPRESENTATION FROM MOTION
    PAVEL, M
    WEINSHALL, D
    INVESTIGATIVE OPHTHALMOLOGY & VISUAL SCIENCE, 1994, 35 (04) : 1277 - 1277
  • [2] An Image Representation for the 3D Face Synthesis
    Luo, Guoliang
    Zeng, Wei
    Xie, Wenqiang
    Lei, Haopeng
    Xian, Chuhua
    PROCEEDINGS OF THE 31ST INTERNATIONAL CONFERENCE ON COMPUTER ANIMATION AND SOCIAL AGENTS (CASA 2016), 2015, : 27 - 31
  • [3] Disentangled Representation Learning for 3D Face Shape
    Jiang, Zi-Hang
    Wu, Qianyi
    Chen, Keyu
    Zhang, Juyong
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 11949 - 11958
  • [4] 3D Face Representation Using Scale and Transform Invariant Features
    Akaguenduez, Erdem
    Ulusoy, Ilkay
    2008 IEEE 16TH SIGNAL PROCESSING, COMMUNICATION AND APPLICATIONS CONFERENCE, VOLS 1 AND 2, 2008, : 14 - 17
  • [5] Geometric Invariant Representation Learning for 3D Point Cloud
    Li, Zongmin
    Zhang, Yupeng
    Bai, Yun
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 1480 - 1485
  • [6] Learning 3D Face Representation with Vision Transformer for Masked Face Recognition
    Wang, Yuan
    Yang, Zhen
    Zhang, Zhiqiang
    Zang, Huaijuan
    Zhu, Qiang
    Zhan, Shu
    2022 ASIA CONFERENCE ON ALGORITHMS, COMPUTING AND MACHINE LEARNING (CACML 2022), 2022, : 505 - 511
  • [7] Face feature detection for 3D model of talking head with speech synthesis
    Talafova, R.
    Rozinaj, G.
    2007 14TH INTERNATIONAL WORKSHOP ON SYSTEMS, SIGNALS, & IMAGE PROCESSING & EURASIP CONFERENCE FOCUSED ON SPEECH & IMAGE PROCESSING, MULTIMEDIA COMMUNICATIONS & SERVICES, 2007, : 339 - +
  • [8] Motion recognition and synthesis based on 3D sparse representation
    Xiang, Jian
    Liang, Ronghua
    SIGNAL PROCESSING, 2015, 110 : 82 - 93
  • [9] Learning Distribution Independent Latent Representation for 3D Face Disentanglement
    Zhang, Zihui
    Yu, Cuican
    Li, Huibin
    Sun, Jian
    Liu, Feng
    2020 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2020), 2020, : 848 - 857
  • [10] Learning Robust 3D Face Reconstruction and Discriminative Identity Representation
    Luo, Yao
    Tu, Xiaoguang
    Xie, Mei
    2019 2ND IEEE INTERNATIONAL CONFERENCE ON INFORMATION COMMUNICATION AND SIGNAL PROCESSING (ICICSP), 2019, : 317 - 321