Detecting Deep-Fake Videos from Phoneme-Viseme Mismatches

被引:77
作者
Agarwal, Shruti [1 ]
Farid, Hany [1 ]
Fried, Ohad [2 ]
Agrawala, Maneesh [2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Stanford Univ, Stanford, CA 94305 USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年
关键词
D O I
10.1109/CVPRW50498.2020.00338
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in machine learning and computer graphics have made it easier to convincingly manipulate video and audio. These so-called deep-fake videos range from complete full-face synthesis and replacement (face-swap), to complete mouth and audio synthesis and replacement (lip-sync), and partial word-based audio and mouth synthesis and replacement. Detection of deep fakes with only a small spatial and temporal manipulation is particularly challenging. We describe a technique to detect such manipulated videos by exploiting the fact that the dynamics of the mouth shape - visemes - are occasionally inconsistent with a spoken phoneme. We focus on the visemes associated with words having the sound M (mama), B (baba), or P (papa) in which the mouth must completely close in order to pronounce these phonemes. We observe that this is not the case in many deep-fake videos. Such phonemeviseme mismatches can, therefore, be used to detect even spatially small and temporally localized manipulations. We demonstrate the efficacy and robustness of this approach to detect different types of deep-fake videos, including in-the-wild deep fakes.
引用
收藏
页码:2814 / 2822
页数:9
相关论文
共 27 条
  • [1] Agarwal S., 2019, P IEEE C COMP VIS PA, P38
  • [2] Baltrusaitis T, 2016, IEEE WINT CONF APPL
  • [3] Towards Evaluating the Robustness of Neural Networks
    Carlini, Nicholas
    Wagner, David
    [J]. 2017 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP), 2017, : 39 - 57
  • [4] Everybody Dance Now
    Chan, Caroline
    Ginosar, Shiry
    Zhou, Tinghui
    Efros, Alexei A.
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5932 - 5941
  • [5] Xception: Deep Learning with Depthwise Separable Convolutions
    Chollet, Francois
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1800 - 1807
  • [6] Ciftci Umur Aybars, 2019, ARXIV190102212
  • [7] Farid H, 2016, PHOTO FORENSICS, P1
  • [8] Text-based Editing of Talking-head Video
    Fried, Ohad
    Tewari, Ayush
    Zollhofer, Michael
    Finkelstein, Adam
    Shechtman, Eli
    Goldman, Dan B.
    Genova, Kyle
    Jin, Zeyu
    Theobalt, Christian
    Agrawala, Maneesh
    [J]. ACM TRANSACTIONS ON GRAPHICS, 2019, 38 (04):
  • [9] Fighting Fake News: Image Splice Detection via Learned Self-Consistency
    Huh, Minyoung
    Liu, Andrew
    Owens, Andrew
    Efros, Alexei A.
    [J]. COMPUTER VISION - ECCV 2018, PT XI, 2018, 11215 : 106 - 124
  • [10] Karras Tero, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P8107, DOI 10.1109/CVPR42600.2020.00813