Robustness of generative AI detection: adversarial attacks on black-box neural text detectors

被引:0
作者
Vitalii Fishchuk [1 ]
Daniel Braun [2 ]
机构
[1] Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente, Enschede
[2] Department of High-tech Business and Entrepreneurship, University of Twente, Enschede
关键词
Adversarial attacks; Generative AI; Large language models; Neural text detection;
D O I
10.1007/s10772-024-10144-2
中图分类号
学科分类号
摘要
The increased quality and human-likeness of AI generated texts has resulted in a rising demand for neural text detectors, i.e. software that is able to detect whether a text was written by a human or generated by an AI. Such tools are often used in contexts where the use of AI is restricted or completely prohibited, e.g. in educational contexts. It is, therefore, important for the effectiveness of such tools that they are robust towards deliberate attempts to hide the fact that a text was generated by an AI. In this article, we investigate a broad range of adversarial attacks in English texts with six different neural text detectors, including commercial and research tools. While the results show that no detector is completely invulnerable to adversarial attacks, the latest generation of commercial detectors proved to be very robust and not significantly influenced by most of the evaluated attack strategies. © The Author(s) 2024.
引用
收藏
页码:861 / 874
页数:13
相关论文
共 50 条
  • [41] Robustness of Image-based Android Malware Detection Under Adversarial Attacks
    Darwaish, Asim
    Nait-Abdesselam, Farid
    Titouna, Chafiq
    Sattar, Sumera
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2021), 2021,
  • [42] A novel method for improving the robustness of deep learning-based malware detectors against adversarial attacks
    Shaukat, Kamran
    Luo, Suhuai
    Varadharajan, Vijay
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
  • [43] Explainable Artificial Intelligence with Integrated Gradients for the Detection of Adversarial Attacks on Text Classifiers
    Moraliyage, Harsha
    Kulawardana, Geemini
    De Silva, Daswin
    Issadeen, Zafar
    Manic, Milos
    Katsura, Seiichiro
    APPLIED SYSTEM INNOVATION, 2025, 8 (01)
  • [44] Multi-task Learning-based Black-box Adversarial Attack on Face Recognition Systems
    Kong, Jiefang
    Wang, Huabin
    Zhou, Jiacheng
    Tao, Liang
    Zhang, Jingjing
    2024 9TH INTERNATIONAL CONFERENCE ON SIGNAL AND IMAGE PROCESSING, ICSIP, 2024, : 554 - 558
  • [45] Not So Robust after All: Evaluating the Robustness of Deep Neural Networks to Unseen Adversarial Attacks
    Garaev, Roman
    Rasheed, Bader
    Khan, Adil Mehmood
    ALGORITHMS, 2024, 17 (04)
  • [46] PWDGAN: Generating Adversarial Malicious URL Examples for Deceiving Black-Box Phishing Website Detector using GANs
    Trinh Nguyen Bac
    Phan The Duy
    Van-Hau Pham
    2021 IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLIED NETWORK TECHNOLOGIES (ICMLANT II), 2021, : 110 - 113
  • [47] FastTextDodger: Decision-Based Adversarial Attack Against Black-Box NLP Models With Extremely High Efficiency
    Hu, Xiaoxue
    Liu, Geling
    Zheng, Baolin
    Zhao, Lingchen
    Wang, Qian
    Zhang, Yufei
    Du, Minxin
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2024, 19 : 2398 - 2411
  • [48] Vulnerable point detection and repair against adversarial attacks for convolutional neural networks
    Jie Gao
    Zhaoqiang Xia
    Jing Dai
    Chen Dang
    Xiaoyue Jiang
    Xiaoyi Feng
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 4163 - 4192
  • [49] Vulnerable point detection and repair against adversarial attacks for convolutional neural networks
    Gao, Jie
    Xia, Zhaoqiang
    Dai, Jing
    Dang, Chen
    Jiang, Xiaoyue
    Feng, Xiaoyi
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (12) : 4163 - 4192
  • [50] SqliGPT: Evaluating and Utilizing Large Language Models for Automated SQL Injection Black-Box Detection
    Gui, Zhiwen
    Wang, Enze
    Deng, Binbin
    Zhang, Mingyuan
    Chen, Yitao
    Wei, Shengfei
    Xie, Wei
    Wang, Baosheng
    APPLIED SCIENCES-BASEL, 2024, 14 (16):