Multi-features based Semantic Augmentation Networks for Named Entity Recognition in Threat Intelligence

被引：9

作者：

Liu, Peipei ^{[1
,2
]}

Li, Hong ^{[1
,2
]}

Wang, Zuoguang ^{[1
,2
]}

Liu, Jie ^{[1
,2
]}

Ren, Yimo ^{[1
,2
]}

Zhu, Hongsong ^{[1
,2
]}

机构：

[1] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

来源：

2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR) | 2022年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

cybersecurity; named entity recognition; multi-features; semantic augmentation; attention mechanism;

D O I：

10.1109/ICPR56361.2022.9956373

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Extracting cybersecurity entities such as attackers and vulnerabilities from unstructured network texts is an important part of security analysis. However, the sparsity of intelligence data resulted from the higher frequency variations and the randomness of cybersecurity entity names makes it difficult for current methods to perform well in extracting security-related concepts and entities. To this end, we propose a semantic augmentation method which incorporates different linguistic features to enrich the representation of input tokens to detect and classify the cybersecurity names over unstructured text. In particular, we encode and aggregate the constituent feature, morphological feature and part of speech feature for each input token to improve the robustness of the method. More than that, a token gets augmented semantic information from its most similar K words in cybersecurity domain corpus where an attentive module is leveraged to weigh differences of the words, and from contextual clues based on a large-scale general field corpus. We have conducted experiments on the cybersecurity datasets DNRTI and MalwareTextDB, and the results demonstrate the effectiveness of the proposed method.

引用

页码：1557 / 1563

页数：7

共 45 条

[1] Aguilar G., 2019, A multi- task approach for named entity recognition in social media data
[2] Amjad M, 2020, PROCEEDINGS OF THE 12TH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2020), P2537
[3] [Anonymous], 2013, SEM COMP ICSC 2013 I
[4] Balduccini M., 2015, INT S PRACT ASP DECL
[5] Dong Y, 2019, PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, P869
[6] Bringing Transparency Design into Practice
Eiband, Malin
Schneider, Hanna
Bilandzic, Mark
Fazekas-Con, Julian
Haug, Mareike
Hussmann, Heinrich
[J]. IUI 2018: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2018, : 211 - 223
[7] Glorot X., 2010, JMLR, P1
[8] Hyejin Shin, 2020, ASIA CCS '20: Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, P665, DOI 10.1145/3320269.3384721
[9] Jansson Patrick, 2017, P 3 WORKSH NOIS US G, P154, DOI DOI 10.18653/V1/W17-4420
[10] Jing LL, 2022, INT J OCCUP SAF ERGO, V28, P842, DOI [10.1080/10803548.2020.1835234, 10.1109/TPAMI.2020.2991050]

← 1 2 3 4 5 →