BFS2Adv: Black-box adversarial attack towards hard-to-attack short texts

被引:4
作者
Han, Xu [1 ]
Li, Qiang [1 ]
Cao, Hongbo [1 ]
Han, Lei [2 ]
Wang, Bin [3 ]
Bao, Xuhua [4 ]
Han, Yufei [5 ]
Wang, Wei [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Secur & Privacy Intelligent Transp, Privacy Intelligent Transportat, Beijing 100044, Peoples R China
[2] Beijing Inst Comp Technol & Applicat, Beijing 100584, Peoples R China
[3] Zhejiang Univ, Zhejiang Key Lab Multidimens Percept Technol Appli, Hangzhou 310027, Peoples R China
[4] Sangfro Technol Inc, Shenzhen 518055, Peoples R China
[5] INRIA, F-35042 Rennes, France
基金
北京市自然科学基金; 中国国家自然科学基金; 国家重点研发计划;
关键词
Text classification; Adversarial attack; Score-based adversarial attack; Hard-to-attack examples;
D O I
10.1016/j.cose.2024.103817
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The advent of Machine Learning as a Service (MLaaS) and deep learning applications has increased the susceptibility of models to adversarial textual attacks, particularly in black -box settings. Prior work on black -box adversarial textual attacks generally follows a stable strategy that involves leveraging char-level, world-level, and sentence-level perturbations, as well as using queries to the target model to find adversarial examples in the search space. However, existing approaches prioritize query efficiency by reducing the search space, thereby overlooking hard-to-attack textual instances. To address this issue, we propose BFS2Adv , a brute force algorithm that generates adversarial examples for both easy-to-attack and hard-to-attack textual inputs. BFS2Adv, starting with an original text, employs word-level perturbations and synonym substitution to construct a comprehensive search space, with each node representing a potential adversarial example. The algorithm systematically explores this space through a breadth-first search, combined with queries to the target model, to effectively identify qualified adversarial examples. We implemented and evaluated a prototype of BFS2Adv against renowned models such as ALBERT and BERT, utilizing the SNLI and MR datasets. Our results demonstrate that BFS2Adv outperforms state -of -the -art algorithms and effectively improves the success rate of short -text adversarial attacks. Furthermore, we provide detailed insights into the robustness of BFS2Adv by analyzing those hard-to-attack examples.
引用
收藏
页数:12
相关论文
共 51 条
[1]  
Alzantot M, 2018, Arxiv, DOI arXiv:1804.07998
[2]  
Amazon SageMaker, 2022, Machine Learning Solution - aws.amazon.com
[3]  
Azizi A, 2021, PROCEEDINGS OF THE 30TH USENIX SECURITY SYMPOSIUM, P2255
[4]  
Behjati M, 2019, INT CONF ACOUST SPEE, P7345, DOI [10.1109/icassp.2019.8682430, 10.1109/ICASSP.2019.8682430]
[5]  
Belinkov Y, 2018, Arxiv, DOI arXiv:1711.02173
[6]  
Berger N., 2021, arXiv
[7]  
Boucher N, 2022, P IEEE S SECUR PRIV, P1987, DOI [10.1109/SP46214.2022.00045, 10.1109/SP46214.2022.9833641]
[8]  
Cheng MH, 2020, AAAI CONF ARTIF INTE, V34, P3601
[9]  
Cloud Computing Services, 2022, Google Cloud - cloud.google.com
[10]  
Conneau A, 2018, Arxiv, DOI arXiv:1705.02364