DeepFake detection algorithm based on improved vision transformer

被引:36
作者
Heo, Young-Jin [1 ]
Yeo, Woon-Ha [2 ]
Kim, Byung-Gyu [3 ]
机构
[1] Sookmyung Womens Univ, Seoul, South Korea
[2] Sookmyung Womens Univ, Dept Convergence Sci, Seoul, South Korea
[3] Sookmyung Womens Univ, Dept Informat Technol IT Engn, Seoul, South Korea
基金
英国工程与自然科学研究理事会;
关键词
Deep learning; Deepfake detection; Distillation; Generative adversarial network; Vision transformer;
D O I
10.1007/s10489-022-03867-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A DeepFake is a manipulated video made with generative deep learning technologies, such as generative adversarial networks or auto encoders that anyone can utilize. With the increase in DeepFakes, classifiers consisting of convolutional neural networks (CNN) that can distinguish them have been actively created. However, CNNs have a problem with overfitting and cannot consider the relation between local regions as global feature of image, resulting in misclassification. In this paper, we propose an efficient vision transformer model for DeepFake detection to extract both local and global features. We combine vector-concatenated CNN feature and patch-based positioning to interact with all positions to specify the artifact region. For the distillation token, the logit is trained using binary cross entropy through the sigmoid function. By adding this distillation, the proposed model is generalized to improve performance. From experiments, the proposed model outperforms the SOTA model by 0.006 AUC and 0.013 f1 score on the DFDC test dataset. For 2,500 fake videos, the proposed model correctly predicts 2,313 as fake, whereas the SOTA model predicts 2,276 in the best performance. With the ensemble method, the proposed model outperformed the SOTA model by 0.01 AUC. For Celeb-DF (v2) dataset, the proposed model achieves a high performance of 0.993 AUC and 0.978 f1 score, respectively.
引用
收藏
页码:7512 / 7527
页数:16
相关论文
共 45 条
[1]   Deepfake Video Detection through Optical Flow based CNN [J].
Amerini, Irene ;
Galteri, Leonardo ;
Caldelli, Roberto ;
Del Bimbo, Alberto .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :1205-1207
[2]   Albumentations: Fast and Flexible Image Augmentations [J].
Buslaev, Alexander ;
Iglovikov, Vladimir I. ;
Khvedchenya, Eugene ;
Parinov, Alex ;
Druzhinin, Mikhail ;
Kalinin, Alexandr A. .
INFORMATION, 2020, 11 (02)
[3]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[4]  
Choi, 2021, ARXIV
[5]   StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation [J].
Choi, Yunjey ;
Choi, Minje ;
Kim, Munyoung ;
Ha, Jung-Woo ;
Kim, Sunghun ;
Choo, Jaegul .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8789-8797
[6]  
Davletshin, 2020, US
[7]  
De Lima O., 2020, ARXIV
[8]  
Dolhansky B., 2019, ARXIV
[9]  
Dolhansky B., 2020, ARXIV
[10]  
Dosovitskiy A., 2020, ICLR, V20, DOI 10.48550/arXiv.2010.11929