Talking human face generation: A survey

被引:24
作者
Toshpulatov, Mukhiddin [1 ]
Lee, Wookey [1 ]
Lee, Suan [2 ]
机构
[1] Inha Univ, Biomed Sci & Engn, 100 Inha Ro,Michuhol Gu, Incheon 22212, South Korea
[2] Semyung Univ, Sch Comp Sci, Jecheon 27136, South Korea
关键词
Talking human face animation; 3D face generation; Deep generative model; Autoencoder; Neural radiance field; Datasets; Evaluation metrics; Neural networks; Unsupervised learning; Mel spectogram; ADVERSARIAL NETWORKS; 3D; GAN; CLASSIFICATION; MODEL; GEOCHEMISTRY;
D O I
10.1016/j.eswa.2023.119678
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Talking human face generation aims at synthesizing a natural human face that talks in correspondence to the given text or audio series. Implementing the recently developed Deep Learning (DL) methods such as Convolutional Neural Networks (CNN), Generative Adversarial Networks (GAN)s, Neural Rendering Fields (NeRF) for data generation, and talking human face generation has attracted significant research interest from academia and industry. They have been explored and exploited recently and have been used to address several problems in image processing and computer vision. Notwithstanding notable advancements, implementing them to real-world problems such as talking human face generation remains challenging. The generation of deepfakes created by the abovementioned methods would greatly promote many fascinating applications, including augmented reality, virtual reality, computer games, teleconferencing, virtual try-on, special movie effects, and avatars. This research reviews and discusses DL related methods, including CNN, GANs, NeRF, and their implementation in talking human face generation. We aim to analyze existing approaches regarding their implementation to talking face generation, investigate the related general problems, and highlight the open study issues. We also provide quantitative and qualitative evaluations of the existing research approaches in the related field.
引用
收藏
页数:28
相关论文
共 254 条
[1]   A Decoupled 3D Facial Shape Model by Adversarial Training [J].
Abrevaya, Victoria Fernandez ;
Boukhayma, Adnane ;
Wuhrer, Stefanie ;
Boyer, Edmond .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9418-9427
[2]   Attribute Manipulation Generative Adversarial Networks for Fashion Images [J].
Ak, Kenan E. ;
Lim, Joo Hwee ;
Tham, Jo Yew ;
Kassim, Ashraf A. .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :10540-10549
[3]   Learning Descriptors Invariance through Equivalence Relations within Manifold: A New Approach to Expression Invariant 3D Face Recognition [J].
Al-Osaimi, Faisal R. .
JOURNAL OF IMAGING, 2020, 6 (11)
[4]  
Algadhy R, 2019, INT CONF ACOUST SPEE, P2367, DOI [10.1109/icassp.2019.8682455, 10.1109/ICASSP.2019.8682455]
[5]  
Almalioglu Y, 2019, IEEE INT CONF ROBOT, P5474, DOI [10.1109/icra.2019.8793512, 10.1109/ICRA.2019.8793512]
[6]  
[Anonymous], 2018, THESIS
[7]  
[Anonymous], 2012, BMVC
[8]  
Antipov G, 2017, IEEE IMAGE PROC, P2089, DOI 10.1109/ICIP.2017.8296650
[9]  
Arbel Michael, 2018, Advances in Neural Information Processing Systems
[10]  
Arjovsky M., 2017, Towards principled methods for training generative adversarial networks