Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引：0

作者：

Vitalii Fishchuk ^{[1
]}

Daniel Braun ^{[2
]}

机构：

[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede

[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede

来源：

International Journal of Speech Technology | 2024年 / 27卷 / 4期

关键词：

Adversarial attacks; Generative AI; Large language models; Neural text detection;

D O I：

10.1007/s10772-024-10144-2

中图分类号：

学科分类号：

摘要：

The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.

引用

页码：861 / 874

页数：13

共 50 条

[1] Resiliency of SNN on Black-Box Adversarial Attacks
Paudel, Bijay Raj
Itani, Aashish
Tragoudas, Spyros
20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 799 - 806
[2] Black-Box Adversarial Attacks Against SQL Injection Detection Model
Alqhtani, Maha
Alghazzawi, Daniyal
Alarifi, Suaad
CONTEMPORARY MATHEMATICS, 2024, 5 (04): : 5098 - 5112
[3] Black-box Adversarial Attacks in Autonomous Vehicle Technology
Kumar, K. Naveen
Vishnu, C.
Mitra, Reshmi
Mohan, C. Krishna
2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
[4] Adversarial Black-Box Attacks Against Network Intrusion Detection Systems: A Survey
Alatwi, Huda Ali
Aldweesh, Amjad
2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 34 - 40
[5] DDSG-GAN: Generative Adversarial Network with Dual Discriminators and Single Generator for Black-Box Attacks
Wang, Fangwei
Ma, Zerou
Zhang, Xiaohan
Li, Qingru
Wang, Changguang
MATHEMATICS, 2023, 11 (04)
[6] Black-box transferable adversarial attacks based on ensemble advGAN
Huang S.-N.
Li Y.-X.
Mao Y.-H.
Ban A.-Y.
Zhang Z.-Y.
Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (10): : 2391 - 2398
[7] Simple Black-Box Universal Adversarial Attacks on Deep Neural Networks for Medical Image Classification
Koga, Kazuki
Takemoto, Kazuhiro
ALGORITHMS, 2022, 15 (05)
[8] Black-box Adversarial Attacks on Commercial Speech Platforms with Minimal Information
Zhene, Baolin
Jiang, Peipei
Wang, Qian
Li, Qi
Shen, Chao
Wang, Cong
Ge, Yunjie
Teng, Qingyang
Zhang, Shenyi
CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 86 - 107
[9] Robustness of Bayesian Neural Networks to White-Box Adversarial Attacks
Uchendu, Adaku
Campoy, Daniel
Menart, Christopher
Hildenbrandt, Alexandra
2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 72 - 80
[10] Defending mutation-based adversarial text perturbation: a black-box approach
Deanda, Demetrio
Alsmadi, Izzat
Guerrero, Jesus
Liang, Gongbo
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (03):

← 1 2 3 4 5 →