MultiDiffEditAttack: A Multi-Modal Black-Box Jailbreak Attack on Image Editing Models
被引:0
|
作者:
Chen, Peihong
论文数: 0引用数: 0
h-index: 0
机构:
Univ Elect Sci & Technol China UESTC, Sch Automat Engn, Chengdu 611731, Peoples R ChinaUniv Elect Sci & Technol China UESTC, Sch Automat Engn, Chengdu 611731, Peoples R China
Chen, Peihong
[1
]
Chen, Feng
论文数: 0引用数: 0
h-index: 0
机构:
Univ Elect Sci & Technol China UESTC, Lab Intelligent Collaborat Comp, Chengdu 611731, Peoples R ChinaUniv Elect Sci & Technol China UESTC, Sch Automat Engn, Chengdu 611731, Peoples R China
Chen, Feng
[2
]
Guo, Lei
论文数: 0引用数: 0
h-index: 0
机构:
Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R ChinaUniv Elect Sci & Technol China UESTC, Sch Automat Engn, Chengdu 611731, Peoples R China
Guo, Lei
[3
]
机构:
[1] Univ Elect Sci & Technol China UESTC, Sch Automat Engn, Chengdu 611731, Peoples R China
[2] Univ Elect Sci & Technol China UESTC, Lab Intelligent Collaborat Comp, Chengdu 611731, Peoples R China
[3] Univ Elect Sci & Technol China UESTC, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
来源:
ELECTRONICS
|
2025年
/
14卷
/
05期
关键词:
jailbreak attack;
image editing models;
multi-modal attack;
security for AI;
D O I:
10.3390/electronics14050899
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
In recent years, image editing models have made notable advancements and gained widespread use. However, these technologies also present significant security risks by enabling the creation of Not Safe For Work (NSFW) content. This study introduces MDEA (MultiDiffEditAttack), an innovative multi-modal black-box jailbreak attack framework designed to evaluate and challenge the security of image editing models. MDEA leverages large language models and genetic algorithms to generate adversarial prompts that modify sensitive vocabulary structures, thereby bypassing prompt filters. Additionally, MDEA employs transfer learning to optimize input image features, effectively bypassing post-hoc safety checks. By integrating prompt attacks and safety checker attacks, MDEA utilizes a multimodal attack strategy to target image editing models in a black-box setting. Experimental results demonstrate that MDEA significantly improves the attack efficiency against image editing models compared to current black-box methods. These results demonstrate the effectiveness of MDEA in multi-modal attacks and reveal numerous vulnerabilities in current defense mechanisms.