Interactive Two-Stream Network Across Modalities for Deepfake Detection

被引:13
作者
Wu, Jianghao [1 ]
Zhang, Baopeng [1 ]
Li, Zhaoyang [1 ]
Pang, Guilin [1 ]
Teng, Zhu [1 ]
Fan, Jianping [2 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China
[2] Lenovo Res, AI Lab, Beijing 100085, Peoples R China
基金
中国国家自然科学基金;
关键词
Deepfake detection; inconsistency representation; cross-modality learning;
D O I
10.1109/TCSVT.2023.3269841
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
As face forgery techniques have become more mature, the proliferation of deepfakes may threaten the security of human society. Although existing deepfake detection methods achieve good performance for in-dataset evaluation, it remains to be improved in the generalization ability, where the representation of the imperceptible artifacts plays a significant role. In this paper, we propose an Interactive Two-Stream Network (ITSNet) to explore the discriminant inconsistency representation from the perspective of cross-modality. In particular, the patch-wise Decomposable Discrete Cosine Transform (DDCT) is adopted to extract fine-grained high-frequency clues, and information from different modalities communicates with each other via a designed interaction module. To perceive the temporal inconsistency, we first develop a Short-term Embedding Module (SEM) to refine subtle local inconsistency representation between adjacent frames, and then a Long-term Embedding Module (LEM) is designed to further refine the erratic temporal inconsistency representation from the long-range perspective. Extensive experimental results conducted on three public datasets show that ITSNet outperforms the state-of-the-art methods both in terms of in-dataset and cross-dataset evaluations.
引用
收藏
页码:6418 / 6430
页数:13
相关论文
共 43 条
[41]   Efficient Deepfake Detection via Layer-Frozen Assisted Dual Attention Network for Consumer Imaging Devices [J].
Talha Usman, Muhammad ;
Khan, Habib ;
Kumar Singh, Sushil ;
Young Lee, Mi ;
Koo, Jakeoung .
IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2025, 71 (01) :281-291
[42]   Exposing low-quality deepfake videos of Social Network Service using Spatial Restored Detection Framework [J].
Li, Ying ;
Bian, Shan ;
Wang, Chuntao ;
Polat, Kemal ;
Alhudhaif, Adi ;
Alenezi, Fayadh .
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
[43]   V3CViT: Deepfake Detection Based on Video Vision Transformer and 3D Convolution Network [J].
Li, Ruotong ;
Yin, Huanpu ;
Li, Yan ;
Li, Haisheng .
PROCEEDINGS OF 2024 CHINESE INTELLIGENT SYSTEMS CONFERENCE, VOL II, CISC 2024, 2024, 1284 :307-316