Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede
[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede
关键词
Adversarial attacks; Generative AI; Large language models; Neural text detection;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [1] Resiliency of SNN on Black-Box Adversarial Attacks
    Paudel, Bijay Raj
    Itani, Aashish
    Tragoudas, Spyros
    20TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2021), 2021, : 799 - 806
  • [2] Black-Box Adversarial Attacks Against SQL Injection Detection Model
    Alqhtani, Maha
    Alghazzawi, Daniyal
    Alarifi, Suaad
    CONTEMPORARY MATHEMATICS, 2024, 5 (04): : 5098 - 5112
  • [3] Black-box Adversarial Attacks in Autonomous Vehicle Technology
    Kumar, K. Naveen
    Vishnu, C.
    Mitra, Reshmi
    Mohan, C. Krishna
    2020 IEEE APPLIED IMAGERY PATTERN RECOGNITION WORKSHOP (AIPR): TRUSTED COMPUTING, PRIVACY, AND SECURING MULTIMEDIA, 2020,
  • [4] Adversarial Black-Box Attacks Against Network Intrusion Detection Systems: A Survey
    Alatwi, Huda Ali
    Aldweesh, Amjad
    2021 IEEE WORLD AI IOT CONGRESS (AIIOT), 2021, : 34 - 40
  • [5] DDSG-GAN: Generative Adversarial Network with Dual Discriminators and Single Generator for Black-Box Attacks
    Wang, Fangwei
    Ma, Zerou
    Zhang, Xiaohan
    Li, Qingru
    Wang, Changguang
    MATHEMATICS, 2023, 11 (04)
  • [6] Black-box transferable adversarial attacks based on ensemble advGAN
    Huang S.-N.
    Li Y.-X.
    Mao Y.-H.
    Ban A.-Y.
    Zhang Z.-Y.
    Jilin Daxue Xuebao (Gongxueban)/Journal of Jilin University (Engineering and Technology Edition), 2022, 52 (10): : 2391 - 2398
  • [7] Simple Black-Box Universal Adversarial Attacks on Deep Neural Networks for Medical Image Classification
    Koga, Kazuki
    Takemoto, Kazuhiro
    ALGORITHMS, 2022, 15 (05)
  • [8] Black-box Adversarial Attacks on Commercial Speech Platforms with Minimal Information
    Zhene, Baolin
    Jiang, Peipei
    Wang, Qian
    Li, Qi
    Shen, Chao
    Wang, Cong
    Ge, Yunjie
    Teng, Qingyang
    Zhang, Shenyi
    CCS '21: PROCEEDINGS OF THE 2021 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2021, : 86 - 107
  • [9] Robustness of Bayesian Neural Networks to White-Box Adversarial Attacks
    Uchendu, Adaku
    Campoy, Daniel
    Menart, Christopher
    Hildenbrandt, Alexandra
    2021 IEEE FOURTH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND KNOWLEDGE ENGINEERING (AIKE 2021), 2021, : 72 - 80
  • [10] Defending mutation-based adversarial text perturbation: a black-box approach
    Deanda, Demetrio
    Alsmadi, Izzat
    Guerrero, Jesus
    Liang, Gongbo
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2025, 28 (03):