Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede
[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede
关键词
Adversarial attacks; Generative AI; Large language models; Neural text detection;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [31] Robustness and Transferability of Adversarial Attacks on Different Image Classification Neural Networks
    Smagulova, Kamilya
    Bacha, Lina
    Fouda, Mohammed E.
    Kanj, Rouwaida
    Eltawil, Ahmed
    ELECTRONICS, 2024, 13 (03)
  • [32] TAGA: A Transfer-based Black-box Adversarial Attack with Genetic Algorithms
    Huang, Liang-Jung
    Yu, Tian-Li
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'22), 2022, : 712 - 720
  • [33] Detection of Adversarial Attacks in AI-Based Intrusion Detection Systems Using Explainable AI
    Tcydenova, Erzhena
    Kim, Tae Woo
    Lee, Changhoon
    Park, Jong Hyuk
    HUMAN-CENTRIC COMPUTING AND INFORMATION SCIENCES, 2021, 11
  • [34] Adversarial attacks against mouse- and keyboard-based biometric authentication: black-box versus domain-specific techniques
    Christian López
    Jesús Solano
    Esteban Rivera
    Lizzy Tengana
    Johana Florez-Lozano
    Alejandra Castelblanco
    Martín Ochoa
    International Journal of Information Security, 2023, 22 : 1665 - 1685
  • [35] Black-Box Attacks against Signed Graph Analysis via Balance Poisoning
    Zhou, Jialong
    Lai, Yuni
    Ren, Jian
    Zhou, Kai
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 530 - 535
  • [36] All in How You Ask for It: Simple Black-Box Method for Jailbreak Attacks
    Takemoto, Kazuhiro
    APPLIED SCIENCES-BASEL, 2024, 14 (09):
  • [37] Adversarial attacks against mouse- and keyboard-based biometric authentication: black-box versus domain-specific techniques
    Lopez, Christian
    Solano, Jesus
    Rivera, Esteban
    Tengana, Lizzy
    Florez-Lozano, Johana
    Castelblanco, Alejandra
    Ochoa, Martin
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2023, 22 (06) : 1665 - 1685
  • [38] Adversarial Attacks on Face Detectors using Neural Net based Constrained Optimization
    Bose, Avishek Joey
    Aarabi, Parham
    2018 IEEE 20TH INTERNATIONAL WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING (MMSP), 2018,
  • [39] Using Non-Linear Activation Functions to increase robustness of AI models to adversarial attacks
    Dror, Itai
    Birman, Raz
    Lachmani, Aviram
    Shmailov, David
    Hadar, Ofer
    COUNTERTERRORISM, CRIME FIGHTING, FORENSICS, AND SURVEILLANCE TECHNOLOGIES VI, 2022, 12275
  • [40] Utilizing Autoencoder to Improve the Robustness of Intrusion Detection Systems against Adversarial Attacks
    Kibenge-MacLeod, Patricia
    Ye, Qiang
    Cui, Fangda
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 970 - 975