Out of Time: Automated Lip Sync in the Wild

被引:219
作者
Chung, Joon Son [1 ]
Zisserman, Andrew [1 ]
机构
[1] Univ Oxford, Visual Geometry Grp, Dept Engn Sci, Oxford, England
来源
COMPUTER VISION - ACCV 2016 WORKSHOPS, PT II | 2017年 / 10117卷
基金
英国工程与自然科学研究理事会;
关键词
SPEECH; SYNCHRONIZATION; TRANSLATION;
D O I
10.1007/978-3-319-54427-4_19
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video. We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lip-sync error in a video. We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.
引用
收藏
页码:251 / 263
页数:13
相关论文
共 29 条
[1]  
Anina Iryna, 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), P1, DOI 10.1109/FG.2015.7163155
[2]  
[Anonymous], 2014, P BMVC
[3]  
[Anonymous], ARXIV160308907
[4]  
[Anonymous], BRIT MACH VIS C
[5]  
[Anonymous], 2007, EURASIP J APPL SIG P
[6]  
[Anonymous], 2014, NIPS
[7]  
[Anonymous], 2016, P ACCV
[8]  
[Anonymous], 2015, ARXIV151106433
[9]  
[Anonymous], 2013, Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding
[10]  
[Anonymous], 2001, International Journal of Image and Graphics (IJIG), Vol, DOI DOI 10.1142/S021946780100027X