Multi-task Learning for Detecting and Segmenting Manipulated Facial Images and Videos

被引:187
作者
Nguyen, Huy H. [1 ]
Fang, Fuming [2 ]
Yamagishi, Junichi [1 ,2 ,4 ]
Echizen, Isao [1 ,2 ,3 ]
机构
[1] Grad Univ Adv Studies, SOKENDAI, Hayama, Kanagawa, Japan
[2] Natl Inst Informat, Tokyo, Japan
[3] Univ Tokyo, Tokyo, Japan
[4] Univ Edinburgh, Edinburgh, Midlothian, Scotland
来源
2019 IEEE 10TH INTERNATIONAL CONFERENCE ON BIOMETRICS THEORY, APPLICATIONS AND SYSTEMS (BTAS) | 2019年
关键词
D O I
10.1109/btas46853.2019.9185974
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detecting manipulated images and videos is an important topic in digital media forensics. Most detection methods use binary classification to determine the probability of a query being manipulated. Another important topic is locating manipulated regions (i.e., performing segmentation), which are mostly created by three commonly used attacks: removal, copy-move, and splicing. We have designed a convolutional neural network that uses the multi-task learning approach to simultaneously detect manipulated images and videos and locate the manipulated regions for each query. Information gained by performing one task is shared with the other task and thereby enhance the performance of both tasks. A semi-supervised learning approach is used to improve the network's generability. The network includes an encoder and a Y-shaped decoder. Activation of the encoded features is used for the binary classification. The output of one branch of the decoder is used for segmenting the manipulated regions while that of the other branch is used for reconstructing the input, which helps improve overall performance. Experiments using the FaceForensics and FaceForensics++ databases demonstrated the networks effectiveness against facial reenactment attacks and face swapping attacks as well as its ability to deal with the mismatch condition for previously seen attacks. Moreover, fine-tuning using just a small amount of data enables the network to deal with unseen attacks.
引用
收藏
页数:8
相关论文
共 32 条
[1]  
Afchar D, 2018, IEEE INT WORKS INFOR
[2]   The Digital Emily Project: Achieving a Photorealistic Digital Actor [J].
Alexander, Oleg ;
Rogers, Mike ;
Lambeth, William ;
Chiang, Jen-Yuan ;
Ma, Wan-Chun ;
Wang, Chuan-Chang ;
Debevec, Paul .
IEEE COMPUTER GRAPHICS AND APPLICATIONS, 2010, 30 (04) :20-31
[3]  
[Anonymous], 2016, P IEEE C COMPUTER VI, DOI DOI 10.1109/CVPR.2016.262
[4]   Bringing Portraits to Life [J].
Averbuch-Elor, Hadar ;
Cohen-Or, Daniel ;
Kopf, Johannes ;
Cohen, Michael F. .
ACM TRANSACTIONS ON GRAPHICS, 2017, 36 (06)
[5]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[6]   Hybrid LSTM and Encoder-Decoder Architecture for Detection of Image Forgeries [J].
Bappy, Jawadul H. ;
Simons, Cody ;
Nataraj, Lakshmanan ;
Manjunath, B. S. ;
Roy-Chowdhury, Amit K. .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (07) :3286-3300
[7]   Exploiting Spatial Structure for Localizing Manipulated Image Regions [J].
Bappy, Jawadul H. ;
Roy-Chowdhury, Amit K. ;
Bunk, Jason ;
Nataraj, Lakshmanan ;
Manjunath, B. S. .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :4980-4989
[8]  
Bayar B., 2016, SER IH MMSEC 16, P5, DOI 10.1145/2909827.2930786
[9]  
Chung J. S., 2017, ARXIV PREPRINT ARXIV, V1, P2
[10]  
Cozzolino D, 2019, Arxiv, DOI arXiv:1812.02510