Interactive Two-Stream Network Across Modalities for Deepfake Detection

被引：13

作者：

Wu, Jianghao ^{[1
]}

Zhang, Baopeng ^{[1
]}

Li, Zhaoyang ^{[1
]}

Pang, Guilin ^{[1
]}

Teng, Zhu ^{[1
]}

Fan, Jianping ^{[2
]}

机构：

[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing 100044, Peoples R China

[2] Lenovo Res, AI Lab, Beijing 100085, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Deepfake detection; inconsistency representation; cross-modality learning;

D O I：

10.1109/TCSVT.2023.3269841

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

As face forgery techniques have become more mature, the proliferation of deepfakes may threaten the security of human society. Although existing deepfake detection methods achieve good performance for in-dataset evaluation, it remains to be improved in the generalization ability, where the representation of the imperceptible artifacts plays a significant role. In this paper, we propose an Interactive Two-Stream Network (ITSNet) to explore the discriminant inconsistency representation from the perspective of cross-modality. In particular, the patch-wise Decomposable Discrete Cosine Transform (DDCT) is adopted to extract fine-grained high-frequency clues, and information from different modalities communicates with each other via a designed interaction module. To perceive the temporal inconsistency, we first develop a Short-term Embedding Module (SEM) to refine subtle local inconsistency representation between adjacent frames, and then a Long-term Embedding Module (LEM) is designed to further refine the erratic temporal inconsistency representation from the long-range perspective. Extensive experimental results conducted on three public datasets show that ITSNet outperforms the state-of-the-art methods both in terms of in-dataset and cross-dataset evaluations.

引用

页码：6418 / 6430

页数：13

共 43 条

[41] Efficient Deepfake Detection via Layer-Frozen Assisted Dual Attention Network for Consumer Imaging Devices [J].

Talha Usman, Muhammad ;

Khan, Habib ;

Kumar Singh, Sushil ;

Young Lee, Mi ;

Koo, Jakeoung .

IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2025, 71 (01) :281-291

[42] Exposing low-quality deepfake videos of Social Network Service using Spatial Restored Detection Framework [J].

Li, Ying ;

Bian, Shan ;

Wang, Chuntao ;

Polat, Kemal ;

Alhudhaif, Adi ;

Alenezi, Fayadh .

EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231

[43] V3CViT: Deepfake Detection Based on Video Vision Transformer and 3D Convolution Network [J].

Li, Ruotong ;

Yin, Huanpu ;

Li, Yan ;

Li, Haisheng .

PROCEEDINGS OF 2024 CHINESE INTELLIGENT SYSTEMS CONFERENCE, VOL II, CISC 2024, 2024, 1284 :307-316

← 1 2 3 4 5 →