Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引：0

作者：

Vitalii Fishchuk ^{[1
]}

Daniel Braun ^{[2
]}

机构：

[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede

[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 4期

关键词：

Adversarial attacks; Generative AI; Large language models; Neural text detection;

D O I：

10.1007/s10772-024-10144-2

中图分类号：

学科分类号：

摘要：

The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.

引用

页码：861 / 874

页数：13

共 50 条

[41] Robustness of Image-based Android Malware Detection Under Adversarial Attacks
Darwaish, Asim
Nait-Abdesselam, Farid
Titouna, Chafiq
Sattar, Sumera
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
[42] A novel method for improving the robustness of deep learning-based malware detectors against adversarial attacks
Shaukat, Kamran
Luo, Suhuai
Varadharajan, Vijay
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
[43] Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers
Moraliyage, Harsha
Kulawardana, Geemini
De Silva, Daswin
Issadeen, Zafar
Manic, Milos
Katsura, Seiichiro
APPLIED SYSTEM INNOVATION, 2025, 8 (01)
[44] Multi-task Learning-based Black-box Adversarial Attack on Face Recognition Systems
Kong, Jiefang
Wang, Huabin
Zhou, Jiacheng
Tao, Liang
Zhang, Jingjing
2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 554 - 558
[45] Not So Robust after All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks
Garaev, Roman
Rasheed, Bader
Khan, Adil Mehmood
ALGORITHMS, 2024, 17 (04)
[46] PWDGAN: Generating Adversarial Malicious URL Examples for Deceiving Black-Box Phishing Website Detector using GANs
Trinh Nguyen Bac
Phan The Duy
Van-Hau Pham
2021 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES (ICMLANT II), 2021, : 110 - 113
[47] FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High Efficiency
Hu, Xiaoxue
Liu, Geling
Zheng, Baolin
Zhao, Lingchen
Wang, Qian
Zhang, Yufei
Du, Minxin
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 2398 - 2411
[48] Vulnerable point detection and repair against adversarial attacks for convolutional neural networks
Jie Gao
Zhaoqiang Xia
Jing Dai
Chen Dang
Xiaoyue Jiang
Xiaoyi Feng
International Journal of Machine Learning and Cybernetics, 2023, 14 : 4163 - 4192
[49] Vulnerable point detection and repair against adversarial attacks for convolutional neural networks
Gao, Jie
Xia, Zhaoqiang
Dai, Jing
Dang, Chen
Jiang, Xiaoyue
Feng, Xiaoyi
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (12) : 4163 - 4192
[50] SqliGPT: Evaluating and Utilizing Large Language Models for Automated SQL Injection Black-Box Detection
Gui, Zhiwen
Wang, Enze
Deng, Binbin
Zhang, Mingyuan
Chen, Yitao
Wei, Shengfei
Xie, Wei
Wang, Baosheng
APPLIED SCIENCES-BASEL, 2024, 14 (16):

← 1 2 3 4 5 →