Textual adversarial attacks in cybersecurity named entity recognition

被引:1
作者
Jiang, Tian [1 ]
Liu, Yunqi [1 ]
Cui, Xiaohui [1 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Key Lab Aerosp Informat Secur & Trusted Comp, Minist Educ, Wuhan, Peoples R China
关键词
Cyber Threat Intelligence; Named Entity Recognition; Fine-tuned models; Adversarial examples; Word substitution; Adversarial detection; THREAT INTELLIGENCE;
D O I
10.1016/j.cose.2024.104278
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the cybersecurity domain, Cyber Threat Intelligence (CTI) includes procedures that lead to textual reports and different types of pieces of information and evidence on cyber threats. To better understand the behaviors of attackers and construct attack graphs, identifying attack-relevant entities in diverse CTI texts precisely and efficiently becomes more important, and Named Entity Recognition (NER) models can help extract entities automatically. However, such fine-tuned models are usually vulnerable to adversarial attacks. In this paper, we first construct an attack framework that can explore textual adversarial attacks in the cybersecurity NER task by generating adversarial CTI texts. Then, we analyze the most important parts of speech (POSs) from the perspective of grammar, and propose a word-substitution-based attack method. To confront adversarial attacks, we also introduce a method to detect potential adversarial examples. Experimental results show that cybersecurity NER models are also vulnerable to adversarial attacks. Among all attack methods, our method can generate adversarial texts that keep a balanced performance in several aspects. Furthermore, adversarial examples generated by all attack methods perform well in the study of transferability, and they can help improve the robustness of NER models through adversarial training. On the defense side, our detection method is simple but effective against multiple types of textual adversarial attacks.
引用
收藏
页数:12
相关论文
共 52 条
[1]  
Agarwal O, 2021, Arxiv, DOI arXiv:2004.04123
[2]  
Aghaei Ehsan, 2023, Security and Privacy in Communication Networks: 18th EAI International Conference, SecureComm 2022, Virtual Event, Proceedings. Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (462), P39, DOI 10.1007/978-3-031-25538-0_3
[3]  
Akbik Alan., 2018, COLING 2018, 27th International Conference on Computational Linguistics, P1638
[4]  
Alam M.T., 2022, arXiv, DOI DOI 10.48550/ARXIV.2204.05754
[5]  
Araujo V, 2020, Arxiv, DOI [arXiv:2004.11157, 10.48550/arXiv.2004.11157, DOI 10.48550/ARXIV.2004.11157]
[6]  
Barnum Sean., 2012, Mitre Corporation, V11, P1
[7]   CySecBERT: A Domain-Adapted Language Model for the Cybersecurity Domain [J].
Bayer, Markus ;
Kuehn, Philipp ;
Shanehsaz, Ramin ;
Reuter, Christian .
ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2024, 27 (02)
[8]   Multi-level fine-tuning, data augmentation, and few-shot learning for specialized cyber threat intelligence [J].
Bayer, Markus ;
Frey, Tobias ;
Reuter, Christian .
COMPUTERS & SECURITY, 2023, 134
[9]  
Cer D, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P169
[10]  
Conneau Alexis., 2020, P 58 ANN M ASS COMP, P8440, DOI DOI 10.18653/V1/2020.ACL-MAIN.747