Out of Time: Automated Lip Sync in the Wild

被引：265

作者：

Chung, Joon Son ^{[1
]}

Zisserman, Andrew ^{[1
]}

机构：

[1] Univ Oxford, Visual Geometry Grp, Dept Engn Sci, Oxford, England

来源：

COMPUTER VISION - ACCV 2016 WORKSHOPS, PT II | 2017年 / 10117卷

基金：

英国工程与自然科学研究理事会;

关键词：

SPEECH; SYNCHRONIZATION; TRANSLATION;

D O I：

10.1007/978-3-319-54427-4_19

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The goal of this work is to determine the audio-video synchronisation between mouth motion and speech in a video. We propose a two-stream ConvNet architecture that enables the mapping between the sound and the mouth images to be trained end-to-end from unlabelled data. The trained network is used to determine the lip-sync error in a video. We apply the network to two further tasks: active speaker detection and lip reading. On both tasks we set a new state-of-the-art on standard benchmark datasets.

引用

页码：251 / 263

页数：13

共 29 条

[1]

Anina Iryna, 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), P1, DOI 10.1109/FG.2015.7163155

[2]

[Anonymous], 2014, P BMVC

[3]

[Anonymous], ARXIV160308907

[4]

[Anonymous], BRIT MACH VIS C

[5]

[Anonymous], 2007, EURASIP J APPL SIG P

[6]

[Anonymous], 2014, NIPS

[7]

[Anonymous], 2016, P ACCV

[8]

[Anonymous], 2015, ARXIV151106433

[9]

[Anonymous], 2013, Caffe: An Open Source Convolutional Architecture for Fast Feature Embedding

[10]

[Anonymous], 2001, International Journal of Image and Graphics (IJIG), Vol, DOI DOI 10.1142/S021946780100027X

← 1 2 3 →