MultiDiffEditAttack: A Multi-Modal Black-Box Jailbreak Attack on Image Editing Models

被引:0
|
作者
Chen, Peihong [1 ]
Chen, Feng [2 ]
Guo, Lei [3 ]
机构
[1] Univ Elect Sci & Technol China UESTC, Sch Automat Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China UESTC, Lab Intelligent Collaborat Comp, Chengdu 611731, Peoples R China
[3] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
来源
ELECTRONICS | 2025年 / 14卷 / 05期
关键词
jailbreak attack; image editing models; multi-modal attack; security for AI;
D O I
10.3390/electronics14050899
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, image editing models have made notable advancements and gained widespread use. However, these technologies also present significant security risks by enabling the creation of Not Safe For Work (NSFW) content. This study introduces MDEA (MultiDiffEditAttack), an innovative multi-modal black-box jailbreak attack framework designed to evaluate and challenge the security of image editing models. MDEA leverages large language models and genetic algorithms to generate adversarial prompts that modify sensitive vocabulary structures, thereby bypassing prompt filters. Additionally, MDEA employs transfer learning to optimize input image features, effectively bypassing post-hoc safety checks. By integrating prompt attacks and safety checker attacks, MDEA utilizes a multimodal attack strategy to target image editing models in a black-box setting. Experimental results demonstrate that MDEA significantly improves the attack efficiency against image editing models compared to current black-box methods. These results demonstrate the effectiveness of MDEA in multi-modal attacks and reveal numerous vulnerabilities in current defense mechanisms.
引用
收藏
页数:25
相关论文
共 1 条
  • [1] Superpixel Attack Enhancing Black-Box Adversarial Attack with Image-Driven Division Areas
    Oe, Issa
    Yamamura, Keiichiro
    Ishikura, Hiroki
    Hamahira, Ryo
    Fujisawa, Katsuki
    ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT I, 2024, 14471 : 141 - 152