Towards Automatic Face-to-Face Translation

被引:104
作者
Prajwal, K. R. [1 ]
Mukhopadhyay, Rudrabha [1 ]
Philip, Jerin [1 ]
Jha, Abhishek [1 ]
Namboodiri, Vinay [1 ]
Jawahar, C. V. [1 ]
机构
[1] IIIT Hyderabad, Hyderabad, India
来源
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19) | 2019年
关键词
Lip Synthesis; Translation systems; Cross-language talking face generation; Neural Machine Translation; Speech to Speech Translation; Voice Transfer;
D O I
10.1145/3343031.3351066
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In light of the recent breakthroughs in automatic machine translation systems, we propose a novel approach that we term as "Face-to-Face Translation". As today's digital communication becomes increasingly visual, we argue that there is a need for systems that can automatically translate a video of a person speaking in language A into a target language B with realistic lip synchronization. In this work, we create an automatic pipeline for this problem and demonstrate its impact in multiple real-world applications. First, we build a working speech-to-speech translation system by bringing together multiple existing modules from speech and language. We then move towards "Face-to-Face Translation" by incorporating a novel visual module, LipGAN for generating realistic talking faces from the translated audio. Quantitative evaluation of LipGAN on the standard LRW test set shows that it significantly outperforms existing approaches across all standard metrics. We also subject our Face-to-Face Translation pipeline, to multiple human evaluations and show that it can significantly improve the overall user experience for consuming and interacting with multimodal content across languages. Code, models and demo video are made publicly available.
引用
收藏
页码:1428 / 1436
页数:9
相关论文
共 35 条
[1]   Deep Audio-Visual Speech Recognition [J].
Afouras, Triantafyllos ;
Chung, Joon Son ;
Senior, Andrew ;
Vinyals, Oriol ;
Zisserman, Andrew .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (12) :8717-8727
[2]  
Amodei D, 2016, PR MACH LEARN RES, V48
[3]  
[Anonymous], 2016, INT WORKSH SPOK LANG
[4]  
[Anonymous], 2019 IEEE INT C AC S
[5]  
[Anonymous], 2016, 52 PERC MILL SMARTPH
[6]  
[Anonymous], 2016, P AS C COMP VIS
[7]  
[Anonymous], 2018, P EUR C COMP VIS ECC
[8]  
[Anonymous], 2017, P INT C LEARN REPR
[9]  
[Anonymous], ARXIV171111293
[10]  
[Anonymous], ARXIV190300089