Exposing fake images generated by text-to-image diffusion models

被引:0
作者
Xu, Qiang [1 ,2 ,6 ,7 ]
Wang, Hao [3 ]
Meng, Laijin [4 ]
Mi, Zhongjie [4 ]
Yuan, Jianye [5 ]
Yan, Hong [1 ,2 ]
机构
[1] City Univ Hong Kong, Dept Elect Engn, Kowloon, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Ctr Intelligent Multidimens Data Anal, Kowloon, Hong Kong, Peoples R China
[3] Chongqing Univ Posts & Telecommun, Coll Comp Sci & Technol, Chongqing 400065, Peoples R China
[4] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China
[5] Wuhan Univ, Sch Elect Informat, Wuhan 473072, Peoples R China
[6] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
[7] City Univ Hong Kong, Ctr Intelligent Multidimens Data Anal, Hong Kong, Peoples R China
关键词
Text-to-image; Diffusion models (DM); Image forensics; Attention mechanism; Vision transformers (ViTs);
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Text-to-image diffusion models (DM) have posed unprecedented challenges to the authenticity and integrity of digital images, which makes the detection of computer-generated images one of the most important image forensics techniques. However, the detection of images generated by text-to-image diffusion models is rarely reported in the literature. To tackle this issue, we first analyze the acquisition process of DM images. Then, we construct a hybrid neural network based on attention-guided feature extraction (AGFE) and vision transformers (ViTs)-based feature extraction (ViTFE) modules. An attention mechanism is adopted in the AGFE module to capture long-range feature interactions and boost the representation capability. ViTFE module containing sequential MobileNetv2 block (MNV2) and MobileViT blocks are designed to learn global representations. By conducting extensive experiments on different types of generated images, the results demonstrate the effectiveness and robustness of our method in exposing fake images generated by text-to-image diffusion models.
引用
收藏
页码:76 / 82
页数:7
相关论文
共 24 条
  • [1] VGGFace2: A dataset for recognising faces across pose and age
    Cao, Qiong
    Shen, Li
    Xie, Weidi
    Parkhi, Omkar M.
    Zisserman, Andrew
    [J]. PROCEEDINGS 2018 13TH IEEE INTERNATIONAL CONFERENCE ON AUTOMATIC FACE & GESTURE RECOGNITION (FG 2018), 2018, : 67 - 74
  • [2] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [3] Ferrari C., 2018, P EUR C COMP VIS ECC
  • [4] Unsupervised Discovery and Manipulation of Continuous Disentangled Factors of Variation
    Fontanini, Tomaso
    Donati, Luca
    Bertozzi, Massimo
    Prati, Andrea
    [J]. ACM TRANSACTIONS ON MULTIMEDIA COMPUTING COMMUNICATIONS AND APPLICATIONS, 2023, 19 (06)
  • [5] Hensel M, 2017, ADV NEUR IN, V30
  • [6] High-quality face image generated with conditional boundary equilibrium generative adversarial networks
    Huang, Bin
    Chen, Weihai
    Wu, Xingming
    Lin, Chun-Liang
    Suganthan, Ponnuthurai Nagaratnam
    [J]. PATTERN RECOGNITION LETTERS, 2018, 111 : 72 - 79
  • [7] Image-to-Image Translation with Conditional Adversarial Networks
    Isola, Phillip
    Zhu, Jun-Yan
    Zhou, Tinghui
    Efros, Alexei A.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 5967 - 5976
  • [8] Karras T, 2018, Arxiv, DOI [arXiv:1710.10196, 10.48550/arXiv.1710.10196, DOI 10.48550/ARXIV.1710.10196]
  • [9] Karras Tero, 2020, P IEEE CVF C COMP VI, P8110
  • [10] Kingma D. P., 2017, ARXIV, DOI DOI 10.48550/ARXIV.1412.6980