V3CViT: Deepfake Detection Based on Video Vision Transformer and 3D Convolution Network

被引:1
作者
Li, Ruotong [1 ,2 ,3 ]
Yin, Huanpu [1 ,2 ,3 ]
Li, Yan [1 ,2 ,3 ]
Li, Haisheng [1 ,2 ,3 ]
机构
[1] Beijing Technol & Business Univ, Sch Comp & Artificial Intelligence, Beijing 100048, Peoples R China
[2] Beijing Key Lab Big Data Technol Food Safety, Beijing 100048, Peoples R China
[3] Natl Engn Lab Agriprod Qual Traceabil, Beijing 100048, Peoples R China
来源
PROCEEDINGS OF 2024 CHINESE INTELLIGENT SYSTEMS CONFERENCE, VOL II, CISC 2024 | 2024年 / 1284卷
基金
中国国家自然科学基金; 北京市自然科学基金;
关键词
Deepfake detection; Video vision transformer; Spatio-temporal modeling; Facial relationship learning;
D O I
10.1007/978-981-97-8654-1_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advancement of Generative Adversarial Networks in synthesizing fake face videos, the proliferation of forged products poses a significant threat to societal security. In video-based detection methods, compression operations cause the disappearance of discriminative features, and the use of deep learning methods often overlooks the interrelationships between features, resulting in limited model generalization. Therefore, we propose a deepfake video detection framework named V3CViT. This framework combines a video vision transformer and 3D convolutional neural networks to extract facial features, capturing facial characteristics in both temporal and spatial dimensions and combining them to extract effective features using attention mechanisms. Subsequently, the structured facial feature maps are utilized with a Gated Graph Convolutional Network to learn facial relationship information for the detection task. The experiments show that compared to existing detection methods, our model can comprehensively capture facial features in forged videos and achieves more significant accuracy on the mixed dataset.
引用
收藏
页码:307 / 316
页数:10
相关论文
共 19 条
[1]   ViViT: A Video Vision Transformer [J].
Arnab, Anurag ;
Dehghani, Mostafa ;
Heigold, Georg ;
Sun, Chen ;
Lucic, Mario ;
Schmid, Cordelia .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826
[2]  
Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[3]  
Chen S, 2021, AAAI CONF ARTIF INTE, V35, P1081
[4]   Not made for each other - Audio-Visual Dissonance-based Deepfake Detection and Localization [J].
Chugh, Komal ;
Gupta, Parul ;
Dhall, Abhinav ;
Subramanian, Ramanathan .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :439-447
[5]   On the Detection of Digital Face Manipulation [J].
Dang, Hao ;
Liu, Feng ;
Stehouwer, Joel ;
Liu, Xiaoming ;
Jain, Anil K. .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5780-5789
[6]   Lips Don't Lie: A Generalisable and Robust Approach to Face Forgery Detection [J].
Haliassos, Alexandros ;
Vougioukas, Konstantinos ;
Petridis, Stavros ;
Pantic, Maja .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5037-5047
[7]   EXPOSING GAN-GENERATED FACES USING INCONSISTENT CORNEAL SPECULAR HIGHLIGHTS [J].
Hu, Shu ;
Li, Yuezun ;
Lyu, Siwei .
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, :2500-2504
[8]   FakeLocator: Robust Localization of GAN-Based Face Manipulations [J].
Huang, Yihao ;
Juefei-Xu, Felix ;
Guo, Qing ;
Liu, Yang ;
Pu, Geguang .
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2022, 17 :2657-2672
[9]  
Jiang L., 2020, Deeperforensics-1.0: a large-scale dataset for real-world face forgery detection, P2889, DOI [10.1109/42600.2020.00296, DOI 10.1109/42600.2020.00296]
[10]   Face X-ray for More General Face Forgery Detection [J].
Li, Lingzhi ;
Bao, Jianmin ;
Zhang, Ting ;
Yang, Hao ;
Chen, Dong ;
Wen, Fang ;
Guo, Baining .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5000-5009