High Performance DeepFake Video Detection on CNN-Based with Attention Target-Specific Regions and Manual Distillation Extraction

被引:20
作者
Tran, Van-Nhan [1 ]
Lee, Suk-Hwan [2 ]
Le, Hoanh-Su [3 ]
Kwon, Ki-Ryong [1 ]
机构
[1] Pukyong Natl Univ, Dept Artificial Intelligence Convergence, Busan 48513, South Korea
[2] Dong A Univ, Dept Comp Engn, Busan 49315, South Korea
[3] Vietnam Natl Univ Ho Chi Minh City, Univ Econ & Law, Fac Informat Syst, Ho Chi Minh City 700000, Vietnam
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 16期
基金
新加坡国家研究基金会;
关键词
DeepFake detection; computer vision and pattern recognition; artificial intelligence;
D O I
10.3390/app11167678
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The rapid development of deep learning models that can produce and synthesize hyper-realistic videos are known as DeepFakes. Moreover, the growth of forgery data has prompted concerns about malevolent intent usage. Detecting forgery videos are a crucial subject in the field of digital media. Nowadays, most models are based on deep learning neural networks and vision transformer, SOTA model with EfficientNetB7 backbone. However, due to the usage of excessively large backbones, these models have the intrinsic drawback of being too heavy. In our research, a high performance DeepFake detection model for manipulated video is proposed, ensuring accuracy of the model while keeping an appropriate weight. We inherited content from previous research projects related to distillation methodology but our proposal approached in a different way with manual distillation extraction, target-specific regions extraction, data augmentation, frame and multi-region ensemble, along with suggesting a CNN-based model as well as flexible classification with a dynamic threshold. Our proposal can reduce the overfitting problem, a common and particularly important problem affecting the quality of many models. So as to analyze the quality of our model, we performed tests on two datasets. DeepFake Detection Dataset (DFDC) with our model obtains 0.958 of AUC and 0.9243 of F1-score, compared with the SOTA model which obtains 0.972 of AUC and 0.906 of F1-score, and the smaller dataset Celeb-DF v2 with 0.978 of AUC and 0.9628 of F1-score.
引用
收藏
页数:14
相关论文
共 51 条
[1]  
Afchar D, 2018, IEEE INT WORKS INFOR
[2]  
Afifi M., 2021, P IEEE CVF C COMP VI, P7941
[3]  
[Anonymous], 2018, ACM SIGGRAPH, DOI 10.1145/3230744.3230818
[4]   OpenFace 2.0: Facial Behavior Analysis Toolkit [J].
Baltrusaitis, Tadas ;
Zadeh, Amir ;
Lim, Yao Chong ;
Morency, Louis-Philippe .
PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, :59-66
[5]  
Bitton J., 2020, The DeepFake Detection Challenge (DFDC) Dataset
[6]  
Chen P., 2020, ARXIV
[7]   StarGAN v2: Diverse Image Synthesis for Multiple Domains [J].
Choi, Yunjey ;
Uh, Youngjung ;
Yoo, Jaejun ;
Ha, Jung-Woo .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8185-8194
[8]   On the Detection of Digital Face Manipulation [J].
Dang, Hao ;
Liu, Feng ;
Stehouwer, Joel ;
Liu, Xiaoming ;
Jain, Anil K. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5780-5789
[9]  
Das S., 2021, ARXIV210209603
[10]  
DeressaWodajo Solomon, 2021, ARXIV210211126