Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引：0

作者：

Vitalii Fishchuk ^{[1
]}

Daniel Braun ^{[2
]}

机构：

[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede

[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 4期

关键词：

Adversarial attacks; Generative AI; Large language models; Neural text detection;

D O I：

10.1007/s10772-024-10144-2

中图分类号：

学科分类号：

摘要：

The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.

引用

页码：861 / 874

页数：13

共 50 条

[31] Robustness and Transferability of Adversarial Attacks on Different Image Classification Neural Networks
Smagulova, Kamilya
Bacha, Lina
Fouda, Mohammed E.
Kanj, Rouwaida
Eltawil, Ahmed
ELECTRONICS, 2024, 13 (03)
[32] TAGA: A Transfer-based Black-box Adversarial Attack with Genetic Algorithms
Huang, Liang-Jung
Yu, Tian-Li
PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 712 - 720
[33] Detection of Adversarial Attacks in AI-Based Intrusion Detection Systems Using Explainable AI
Tcydenova, Erzhena
Kim, Tae Woo
Lee, Changhoon
Park, Jong Hyuk
HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2021, 11
[34] Adversarial attacks against mouse- and keyboard-based biometric authentication: black-box versus domain-specific techniques
Christian López
Jesús Solano
Esteban Rivera
Lizzy Tengana
Johana Florez-Lozano
Alejandra Castelblanco
Martín Ochoa
International Journal of Information Security, 2023, 22 : 1665 - 1685
[35] Black-Box Attacks against Signed Graph Analysis via Balance Poisoning
Zhou, Jialong
Lai, Yuni
Ren, Jian
Zhou, Kai
2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 530 - 535
[36] All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
Takemoto, Kazuhiro
APPLIED SCIENCES-BASEL, 2024, 14 (09):
[37] Adversarial attacks against mouse- and keyboard-based biometric authentication: black-box versus domain-specific techniques
Lopez, Christian
Solano, Jesus
Rivera, Esteban
Tengana, Lizzy
Florez-Lozano, Johana
Castelblanco, Alejandra
Ochoa, Martin
INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2023, 22 (06) : 1665 - 1685
[38] Adversarial Attacks on Face Detectors using Neural Net based Constrained Optimization
Bose, Avishek Joey
Aarabi, Parham
2018 IEEE 20TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2018,
[39] Using Non-Linear Activation Functions to increase robustness of AI models to adversarial attacks
Dror, Itai
Birman, Raz
Lachmani, Aviram
Shmailov, David
Hadar, Ofer
COUNTERTERRORISM, CRIME FIGHTING, FORENSICS, AND SURVEILLANCE TECHNOLOGIES VI, 2022, 12275
[40] Utilizing Autoencoder to Improve the Robustness of Intrusion Detection Systems against Adversarial Attacks
Kibenge-MacLeod, Patricia
Ye, Qiang
Cui, Fangda
IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 970 - 975

← 1 2 3 4 5 →