MultiDiffEditAttack: A Multi-Modal Black-Box Jailbreak Attack on Image Editing Models

被引：0

作者：

Chen, Peihong ^{[1
]}

Chen, Feng ^{[2
]}

Guo, Lei ^{[3
]}

机构：

[1] Univ Elect Sci & Technol China UESTC, Sch Automat Engn, Chengdu 611731, Peoples R China

[2] Univ Elect Sci & Technol China UESTC, Lab Intelligent Collaborat Comp, Chengdu 611731, Peoples R China

[3] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

来源：

ELECTRONICS | 2025年 / 14卷 / 05期

关键词：

jailbreak attack; image editing models; multi-modal attack; security for AI;

D O I：

10.3390/electronics14050899

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, image editing models have made notable advancements and gained widespread use. However, these technologies also present significant security risks by enabling the creation of Not Safe For Work (NSFW) content. This study introduces MDEA (MultiDiffEditAttack), an innovative multi-modal black-box jailbreak attack framework designed to evaluate and challenge the security of image editing models. MDEA leverages large language models and genetic algorithms to generate adversarial prompts that modify sensitive vocabulary structures, thereby bypassing prompt filters. Additionally, MDEA employs transfer learning to optimize input image features, effectively bypassing post-hoc safety checks. By integrating prompt attacks and safety checker attacks, MDEA utilizes a multimodal attack strategy to target image editing models in a black-box setting. Experimental results demonstrate that MDEA significantly improves the attack efficiency against image editing models compared to current black-box methods. These results demonstrate the effectiveness of MDEA in multi-modal attacks and reveal numerous vulnerabilities in current defense mechanisms.

引用

页数：25