Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引：0

作者：

Vitalii Fishchuk ^{[1
]}

Daniel Braun ^{[2
]}

机构：

[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede

[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 4期

关键词：

Adversarial attacks; Generative AI; Large language models; Neural text detection;

D O I：

10.1007/s10772-024-10144-2

中图分类号：

学科分类号：

摘要：

The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.

引用

页码：861 / 874

页数：13

共 50 条

[21] A Methodology for Evaluating the Robustness of Anomaly Detectors to Adversarial Attacks in Industrial Scenarios
Perales Gomez, Angel Luis
Fernandez Maimo, Lorenzo
Garcia Clemente, Felix J.
Maroto Morales, Javier Alejandro
Huertas Celdran, Alberto
Bovet, Gerome
IEEE ACCESS, 2022, 10 : 124582 - 124594
[22] Multi-Agent Attacks for Black-Box Social Recommendations
Wang, Shijie
Fan, Wenqi
Wei, Xiao-yong
Mei, Xiaowei
Lin, Shanru
Li, Qing
ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (01)
[23] Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors
Apruzzese, Giovanni
Subrahmanian, V. S.
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (05) : 3753 - 3769
[24] Black-Box Adversarial Attack for Deep Learning Classifiers in IoT Applications
Singh, Abhijit
Sikdar, Biplab
2022 IEEE 8TH WORLD FORUM ON INTERNET OF THINGS, WF-IOT, 2022,
[25] MC-Net: Realistic Sample Generation for Black-Box Attacks
Duan, Mingxing
Jiao, Kailun
Yu, Siyang
Yang, Zhibang
Xiao, Bin
Li, Kenli
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 3008 - 3022
[26] Empirical Perturbation Analysis of Two Adversarial Attacks: Black Box versus White Box
Chitic, Raluca
Topal, Ali Osman
Leprevost, Franck
APPLIED SCIENCES-BASEL, 2022, 12 (14):
[27] Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning
Zhang, Yinghua
Song, Yangqiu
Liang, Jian
Bai, Kun
Yang, Qiang
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2989 - 2997
[28] Robustness of Sparsely Distributed Representations to Adversarial Attacks in Deep Neural Networks
Sardar, Nida
Khan, Sundas
Hintze, Arend
Mehra, Priyanka
ENTROPY, 2023, 25 (06)
[29] Improving Robustness Against Adversarial Attacks with Deeply Quantized Neural Networks
Ayaz, Ferheen
Zakariyya, Idris
Cano, Jose
Keoh, Sye Loong
Singer, Jeremy
Pau, Danilo
Kharbouche-Harrari, Mounia
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
[30] Robustness Against Adversarial Attacks in Neural Networks Using Incremental Dissipativity
Aquino, Bernardo
Rahnama, Arash
Seiler, Peter
Lin, Lizhen
Gupta, Vijay
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2341 - 2346

← 1 2 3 4 5 →