Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede
[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede
关键词
Adversarial attacks; Generative AI; Large language models; Neural text detection;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [21] A Methodology for Evaluating the Robustness of Anomaly Detectors to Adversarial Attacks in Industrial Scenarios
    Perales Gomez, Angel Luis
    Fernandez Maimo, Lorenzo
    Garcia Clemente, Felix J.
    Maroto Morales, Javier Alejandro
    Huertas Celdran, Alberto
    Bovet, Gerome
    IEEE ACCESS, 2022, 10 : 124582 - 124594
  • [22] Multi-Agent Attacks for Black-Box Social Recommendations
    Wang, Shijie
    Fan, Wenqi
    Wei, Xiao-yong
    Mei, Xiaowei
    Lin, Shanru
    Li, Qing
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2025, 43 (01)
  • [23] Mitigating Adversarial Gray-Box Attacks Against Phishing Detectors
    Apruzzese, Giovanni
    Subrahmanian, V. S.
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2023, 20 (05) : 3753 - 3769
  • [24] Black-Box Adversarial Attack for Deep Learning Classifiers in IoT Applications
    Singh, Abhijit
    Sikdar, Biplab
    2022 IEEE 8TH WORLD FORUM ON INTERNET OF THINGS, WF-IOT, 2022,
  • [25] MC-Net: Realistic Sample Generation for Black-Box Attacks
    Duan, Mingxing
    Jiao, Kailun
    Yu, Siyang
    Yang, Zhibang
    Xiao, Bin
    Li, Kenli
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 3008 - 3022
  • [26] Empirical Perturbation Analysis of Two Adversarial Attacks: Black Box versus White Box
    Chitic, Raluca
    Topal, Ali Osman
    Leprevost, Franck
    APPLIED SCIENCES-BASEL, 2022, 12 (14):
  • [27] Two Sides of the Same Coin: White-box and Black-box Attacks for Transfer Learning
    Zhang, Yinghua
    Song, Yangqiu
    Liang, Jian
    Bai, Kun
    Yang, Qiang
    KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, : 2989 - 2997
  • [28] Robustness of Sparsely Distributed Representations to Adversarial Attacks in Deep Neural Networks
    Sardar, Nida
    Khan, Sundas
    Hintze, Arend
    Mehra, Priyanka
    ENTROPY, 2023, 25 (06)
  • [29] Improving Robustness Against Adversarial Attacks with Deeply Quantized Neural Networks
    Ayaz, Ferheen
    Zakariyya, Idris
    Cano, Jose
    Keoh, Sye Loong
    Singer, Jeremy
    Pau, Danilo
    Kharbouche-Harrari, Mounia
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [30] Robustness Against Adversarial Attacks in Neural Networks Using Incremental Dissipativity
    Aquino, Bernardo
    Rahnama, Arash
    Seiler, Peter
    Lin, Lizhen
    Gupta, Vijay
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2341 - 2346