Deepfake Detection: Analyzing Model Generalization Across Architectures, Datasets, and Pre-Training Paradigms

被引:3
作者
Khan, Sohail Ahmed [1 ]
Dang-Nguyen, Duc-Tien [1 ]
机构
[1] Univ Bergen, Dept Informat Sci & Media Studies, MediaFutures, N-5007 Bergen, Norway
关键词
Deepfake detection; image classification; convolutional neural networks; transformers; video processing;
D O I
10.1109/ACCESS.2023.3348450
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As deepfake technology gains traction, the need for reliable detection systems is crucial. Recent research has introduced various deep learning-based detection systems, yet they exhibit limitations in generalising effectively across diverse data distributions that differ from the training data. Our study focuses on understanding the generalisation challenge by exploring different aspects such as deep learning model architectures, pre-training strategies and datasets. Through a comprehensive comparative analysis, we evaluate multiple supervised and self-supervised deep learning models for deepfake detection. Specifically, we evaluate eight supervised deep learning architectures and two transformer-based models pre-trained using self-supervised strategies (DINO, CLIP) on four different deepfake detection benchmarks (FakeAVCeleb, CelebDF-V2, DFDC and FaceForensics++). Our analysis encompasses both intra-dataset and inter-dataset evaluations, with the objective of identifying the top-performing models, datasets that equip trained models with optimal generalisation capabilities, and assessing the influence of image augmentations on model performance. We also investigate the trade-off between model size, efficiency and performance. Our main goal is to provide insights into the effectiveness of different deep learning architectures (transformers, CNNs), training strategies (supervised, self-supervised) and deepfake detection benchmarks. Following an extensive empirical analysis, we conclude that Transformer models surpass CNN models in deepfake detection. Furthermore, we show that FaceForensics++ and DFDC datasets equip models with comparably better generalisation capabilities, as compared to FakeAVCeleb and CelebDF-V2 datasets. Our analysis also demonstrates that image augmentations can be beneficial in achieving improved performance, particularly for Transformer models.
引用
收藏
页码:1880 / 1908
页数:29
相关论文
共 72 条
[1]  
Afchar D, 2018, IEEE INT WORKS INFOR
[2]  
Alayrac JB, 2022, ADV NEUR IN
[3]  
Bertasius G, 2021, PR MACH LEARN RES, V139
[4]  
Brown TB, 2020, ADV NEUR IN, V33
[5]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[6]   Emerging Properties in Self-Supervised Vision Transformers [J].
Caron, Mathilde ;
Touvron, Hugo ;
Misra, Ishan ;
Jegou, Herve ;
Mairal, Julien ;
Bojanowski, Piotr ;
Joulin, Armand .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9630-9640
[7]  
Chen CF, 2019, Arxiv, DOI [arXiv:1807.03848, 10.48550/arXiv.1807.03848]
[8]   Self-supervised Learning of Adversarial Example: Towards Good Generalizations for Deepfake Detection [J].
Chen, Liang ;
Zhang, Yong ;
Song, Yibing ;
Liu, Lingqiao ;
Wang, Jue .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :18689-18698
[9]  
Chen SH, 2019, Arxiv, DOI arXiv:1904.00625
[10]   UNITER: UNiversal Image-TExt Representation Learning [J].
Chen, Yen-Chun ;
Li, Linjie ;
Yu, Licheng ;
El Kholy, Ahmed ;
Ahmed, Faisal ;
Gan, Zhe ;
Cheng, Yu ;
Liu, Jingjing .
COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :104-120