Learning Features of Intra-Consistency and Inter-Diversity: Keys Toward Generalizable Deepfake Detection

被引:39
作者
Chen, Han [1 ,2 ]
Lin, Yuzhen [1 ,2 ]
Li, Bin [1 ,2 ]
Tan, Shunquan [1 ,2 ]
机构
[1] Shenzhen Univ, Guangdong Key Lab Intelligent Informat Proc, Shenzhen Key Lab Media Secur, Shenzhen 518060, Peoples R China
[2] Shenzhen Univ, Guangdong Lab Artificial Intelligence & Digital Ec, Shenzhen 518060, Peoples R China
关键词
Deepfakes; Task analysis; Forgery; Faces; Feature extraction; Data models; Transformers; Deepfake detection; self-supervised learning; masked image modeling; transformer; generalization;
D O I
10.1109/TCSVT.2022.3209336
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Public concerns about deepfake face forgery are continually rising in recent years. Most deepfake detection approaches attempt to learn discriminative features between real and fake faces through end-to-end trained deep neural networks. However, the majorities of them suffer from the problem of poor generalization across different data sources, forgery methods, and/or post-processing operations. In this paper, following the simple but effective principle in discriminative representation learning, i.e., towards learning features of intra-consistency within classes and inter-diversity between classes, we leverage a novel transformer-based self-supervised learning method and an effective data augmentation strategy towards generalizable deepfake detection. Considering the differences between the real and fake images are often subtle and local, the proposed method firstly utilizes Self Prediction Learning (SPL) to learn rich hidden representations by predicting masked patches at a pre-training stage. Intra-class consistency clues in images can be mined without deepfake labels. After pre-training, the discrimination model is then fine-tuned via multi-task learning, including a deepfake classification task and a forgery mask estimation task. It is facilitated by our new data augmentation method called Adjustable Forgery Synthesizer (AFS), which can conveniently simulate the process of synthesizing deepfake images with various levels of visual reality in an explicit manner. AFS greatly prevents overfitting due to insufficient diversity in training data. Comprehensive experiments demonstrate that our method outperforms the state-of-the-art competitors on several popular benchmark datasets in terms of generalization to unseen forgery methods and untrained datasets.
引用
收藏
页码:1468 / 1480
页数:13
相关论文
共 55 条
[1]  
Bao H., 2022, PROC INT C LEARN REP
[2]   End-to-End Object Detection with Transformers [J].
Carion, Nicolas ;
Massa, Francisco ;
Synnaeve, Gabriel ;
Usunier, Nicolas ;
Kirillov, Alexander ;
Zagoruyko, Sergey .
COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229
[3]   A Robust GAN-Generated Face Detection Method Based on Dual-Color Spaces and an Improved Xception [J].
Chen, Beijing ;
Liu, Xin ;
Zheng, Yuhui ;
Zhao, Guoying ;
Shi, Yun-Qing .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (06) :3527-3538
[4]  
Chen T, 2020, PR MACH LEARN RES, V119
[5]   On the Detection of Digital Face Manipulation [J].
Dang, Hao ;
Liu, Feng ;
Stehouwer, Joel ;
Liu, Xiaoming ;
Jain, Anil K. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5780-5789
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Dolhansky B, 2020, Arxiv, DOI arXiv:2006.07397
[8]  
Dolhansky B, 2019, Arxiv, DOI arXiv:1910.08854
[9]  
Dong X., 2022, P IEEECVF C COMPUTER, P9468
[10]  
Dosovitskiy A., 2021, arXiv