Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches

被引：77

作者：

Agarwal, Shruti ^{[1
]}

Farid, Hany ^{[1
]}

Fried, Ohad ^{[2
]}

Agrawala, Maneesh ^{[2
]}

机构：

[1] Univ Calif Berkeley, Berkeley, CA 94720 USA

[2] Stanford Univ, Stanford, CA 94305 USA

来源：

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年

关键词：

D O I：

10.1109/CVPRW50498.2020.00338

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recent advances in machine learning and computer graphics have made it easier to convincingly manipulate video and audio. These so-called deep-fake videos range from complete full-face synthesis and replacement (face-swap), to complete mouth and audio synthesis and replacement (lip-sync), and partial word-based audio and mouth synthesis and replacement. Detection of deep fakes with only a small spatial and temporal manipulation is particularly challenging. We describe a technique to detect such manipulated videos by exploiting the fact that the dynamics of the mouth shape - visemes - are occasionally inconsistent with a spoken phoneme. We focus on the visemes associated with words having the sound M (mama), B (baba), or P (papa) in which the mouth must completely close in order to pronounce these phonemes. We observe that this is not the case in many deep-fake videos. Such phonemeviseme mismatches can, therefore, be used to detect even spatially small and temporally localized manipulations. We demonstrate the efficacy and robustness of this approach to detect different types of deep-fake videos, including in-the-wild deep fakes.

引用

页码：2814 / 2822

页数：9

共 27 条

[1] Agarwal S., 2019, P IEEE C COMP VIS PA, P38
[2] Baltrusaitis T, 2016, IEEE WINT CONF APPL
[3] Towards Evaluating the Robustness of Neural Networks
Carlini, Nicholas
Wagner, David
[J]. 2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 39 - 57
[4] Everybody Dance Now
Chan, Caroline
Ginosar, Shiry
Zhou, Tinghui
Efros, Alexei A.
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5932 - 5941
[5] Xception: Deep Learning with Depthwise Separable Convolutions
Chollet, Francois
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
[6] Ciftci Umur Aybars, 2019, ARXIV190102212
[7] Farid H, 2016, PHOTO FORENSICS, P1
[8] Text-based Editing of Talking-head Video
Fried, Ohad
Tewari, Ayush
Zollhofer, Michael
Finkelstein, Adam
Shechtman, Eli
Goldman, Dan B.
Genova, Kyle
Jin, Zeyu
Theobalt, Christian
Agrawala, Maneesh
[J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04):
[9] Fighting Fake News: Image Splice Detection via Learned Self-Consistency
Huh, Minyoung
Liu, Andrew
Owens, Andrew
Efros, Alexei A.
[J]. COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 106 - 124
[10] Karras Tero, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P8107, DOI 10.1109/CVPR42600.2020.00813

← 1 2 3 →