Cybersecurity Named Entity Recognition Using Multi-Modal Ensemble Learning

被引:27
作者
Yi, Feng [1 ]
Jiang, Bo [2 ]
Wang, Lu [2 ]
Wu, Jianjun [3 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci, Zhongshan Inst, Zhongshan 528402, Peoples R China
[2] Chinese Acad Sci, Inst Informat Engn, Beijing 100093, Peoples R China
[3] Beijing Coll Polit & Law, Beijing 100024, Peoples R China
基金
中国国家自然科学基金;
关键词
Cybersecurity; named entity recognition; regular expression; known-entity dictionary; conditional random fields;
D O I
10.1109/ACCESS.2020.2984582
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cybersecurity named entity recognition is an important part of threat information extraction from large-scale unstructured text collection in many cybersecurity applications. Most existing security entity recognition studies and systems use regular matching strategy or machine learning algorithms. Due to the peculiarity and complexity of security named entity, these models ignore the characteristic of security data and the correlation of entities. Therefore, through the in-depth study of security entity characteristic, we propose a novel security named entity recognition model based on regular expressions and known-entity dictionary as well as conditional random fields (CRF) combined with four feature templates. This model is named RDF-CRF. The rule-based expressions can match security entities with good accuracy in simpler situations, the known-entity dictionary can extract common and specific security entity, and the CRF-based extractor leverages the identified entities by rule-based and dictionary-based extractors to further improve the recognition performance. In order to demonstrate the effectiveness of our proposed model, extensive experiments are performed on a security text dataset collected from public security webs. The experimental results show that can achieve better performance than state-of-the-art methods.
引用
收藏
页码:63214 / 63224
页数:11
相关论文
共 28 条
  • [1] Baldi M, 2015, 2015 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), P1, DOI 10.1109/IACS.2015.7103192
  • [2] Bridges Robert A, 2013, ARXIV13084941
  • [3] Deliu I, 2017, IEEE INT CONF BIG DA, P3648, DOI 10.1109/BigData.2017.8258359
  • [4] dos Santos C. N., 2015, ARXIV150505008
  • [5] A rule-based named-entity recognition method for knowledge extraction of evidence based dietary recommendations
    Eftimov, Tome
    Seljak, Barbara Korousic
    Korosec, Peter
    [J]. PLOS ONE, 2017, 12 (06):
  • [6] Gasmi H, 2018, THIRTEENTH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING ADVANCES (ICSEA 2018), P1
  • [7] Husari G, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENCE AND SECURITY INFORMATICS (ISI), P1, DOI 10.1109/ISI.2018.8587343
  • [8] Jones CC, 2015, CONTROVERS AM CONS, P11
  • [9] Extracting Cybersecurity Related Linked Data from Text
    Joshi, Arnav
    Lal, Ravendar
    Finin, Tim
    Joshi, Anupam
    [J]. 2013 IEEE SEVENTH INTERNATIONAL CONFERENCE ON SEMANTIC COMPUTING (ICSC 2013), 2013, : 252 - 259
  • [10] Khalid MA, 2008, LECT NOTES COMPUT SC, V4956, P705