Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain

被引：337

作者：

Liu, Honggu ^{[1
,2
]}

Li, Xiaodan ^{[2
]}

Zhou, Wenbo ^{[1
]}

Chen, Yuefeng ^{[2
]}

He, Yuan ^{[2
]}

Xue, Hui ^{[2
]}

Zhang, Weiming ^{[1
]}

Yu, Nenghai ^{[1
]}

机构：

[1] Univ Sci & Technol China, CAS Key Lab Electromagnet Space Informat, Hefei, Peoples R China

[2] Alibaba Grp, Hangzhou, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

关键词：

D O I：

10.1109/CVPR46437.2021.00083

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The remarkable success in face forgery techniques has received considerable attention in computer vision due to security concerns. We observe that up-sampling is a necessary step of most face forgery techniques, and cumulative up-sampling will result in obvious changes in the frequency domain, especially in the phase spectrum. According to the property of natural images, the phase spectrum preserves abundant frequency components that provide extra information and complement the loss of the amplitude spectrum. To this end, we present a novel Spatial-Phase Shallow Learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery to improve the transferability, for face forgery detection. And we also theoretically analyze the validity of utilizing the phase spectrum. Moreover, we notice that local texture information is more crucial than high-level semantic information for the face forgery detection task. So we reduce the receptive fields by shallowing the network to suppress high-level features and focus on the local region. Extensive experiments show that SPSL can achieve the state-of-the-art performance on cross-datasets evaluation as well as multi-class classification and obtain comparable results on single dataset evaluation.

引用

页码：772 / 781

页数：10

共 49 条

[41] Face2Face: Real-time Face Capture and Reenactment of RGB Videos [J].

Thies, Justus ;

Zollhofer, Michael ;

Stamminger, Marc ;

Theobalt, Christian ;

Niessner, Matthias .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2387-2395

[42]

van der Maaten L, 2008, J MACH LEARN RES, V9, P2579

[43]

Wang SY, 2020, CHIN CONTR CONF, P6954, DOI 10.23919/CCC50068.2020.9189580

[44] Sharp Multiple Instance Learning for DeepFake Video Detection [J].

Li, Xiaodan ;

Lang, Yining ;

Chen, Yuefeng ;

Mao, Xiaofeng ;

He, Yuan ;

Wang, Shuhui ;

Xue, Hui ;

Lu, Quan .

MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :1864-1872

[45] On the Generalization of GAN Image Forensics [J].

Xuan, Xinsheng ;

Peng, Bo ;

Wang, Wei ;

Dong, Jing .

BIOMETRIC RECOGNITION (CCBR 2019), 2019, 11818 :134-141

[46] Faster image super-resolution by improved frequency-domain neural networks [J].

Xue, Shengke ;

Qiu, Wenyuan ;

Liu, Fan ;

Jin, Xinyu .

SIGNAL IMAGE AND VIDEO PROCESSING, 2020, 14 (02) :257-265

[47]

Yang X, 2019, INT CONF ACOUST SPEE, P8261, DOI 10.1109/ICASSP.2019.8683164

[48]

Zhao Tianchen, 2020, ARXIVABS201209311

[49] Two-Stream Neural Networks for Tampered Face Detection [J].

Zhou, Peng ;

Han, Xintong ;

Morariu, Vlad I. ;

Davis, Larry S. .

2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1831-1839

← 1 2 3 4 5 →