CyNER: Information Extraction from Unstructured Text of CTI Sources with Noncontextual IOCs

被引:8
作者
Fujii, Shota [1 ,2 ]
Kawaguchi, Nobutaka [1 ]
Shigemoto, Tomohiro [1 ]
Yamauchi, Toshihiro [3 ]
机构
[1] Hitachi Ltd, Res & Dev Grp, Yokohama, Kanagawa, Japan
[2] Okayama Univ, Grad Sch Nat Sci & Technol, Okayama, Japan
[3] Okayama Univ, Fac Nat Sci & Technol, Okayama, Japan
来源
ADVANCES IN INFORMATION AND COMPUTER SECURITY, IWSEC 2022 | 2022年 / 13504卷
关键词
Cyber Threat Intelligence; Information Extraction; Named Entity Recognition; Relation Extraction; STIX;
D O I
10.1007/978-3-031-15255-9_5
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Cybersecurity threats have been increasing and growing more sophisticated year by year. In such circumstances, gathering Cyber Threat Intelligence (CTI) and following up with up-to-date threat information is crucial. Structured CTI such as Structured Threat Information eXpression (STIX) is particularly useful because it can automate security operations such as updating FW/IDS rules and analyzing attack trends. However, as most CTIs are written in natural language, manual analysis with domain knowledge is required, which becomes quite time-consuming. In this work, we propose CyNER, a method for automatically structuring CTIs and converting them into STIX format. CyNER extracts named entities in the context of CTI and then extracts the relations between named entities and IOCs in order to convert them into STIX. In addition, by using key phrase extraction, CyNER can extract relations between IOCs that lack contextual information, such as those listed at the bottom of a CTI, and named entities. We describe our design and implementation of CyNER and demonstrate that it can extract named entities with the F-measure of 0.80 and extract relations between named entities and IOCs with the maximum accuracy of 81.6%. Our analysis of structured CTI showed that CyNER can extract IOCs that are not included in existing reputation sites, and that it can automatically extract IOCs that have been exploited for a long time and across multiple attack groups. CyNER is thus expected to contribute to the efficiency of CTI analysis.
引用
收藏
页码:85 / 104
页数:20
相关论文
共 45 条
  • [21] Mandiant, 2013, OPENIOC
  • [22] Cyber Threat Intelligence Model: An Evaluation of Taxonomies, Sharing Standards, and Ontologies within Cyber Threat Intelligence
    Mavroeidis, Vasileios
    Bromander, Siri
    [J]. 2017 EUROPEAN INTELLIGENCE AND SECURITY INFORMATICS CONFERENCE (EISIC), 2017, : 91 - 98
  • [23] PACE: Pattern Accurate Computationally Efficient Bootstrapping for Timely Discovery of Cyber-Security Concepts
    McNeil, Nikki
    Bridges, Robert A.
    Iannacone, Michael D.
    Czejdo, Bogdan
    Perez, Nicolas
    Goodall, John R.
    [J]. 2013 12TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2013), VOL 2, 2013, : 60 - 65
  • [24] Metcalf Leigh., 2015, P 2 ACM WORKSH INF S, P13, DOI 10.1145/2808128.2808129
  • [25] Mikolov Tomas, 2013, ARXIV, DOI 10.48550/arXiv.1301.3781
  • [26] POIROT: Aligning Attack Behavior with Kernel Audit Records for Cyber Threat Hunting
    Milajerdi, Sadegh M.
    Eshete, Birhanu
    Gjomemo, Rigel
    Venkatakrishnan, V. N.
    [J]. PROCEEDINGS OF THE 2019 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY (CCS'19), 2019, : 1795 - 1812
  • [27] Min B., 2012, Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, P1027
  • [28] MISP, 2021, MISP-Open Source Threat Intelligence Platform & Open Standards For Threat Information Sharing
  • [29] Mittal S, 2016, PROCEEDINGS OF THE 2016 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING ASONAM 2016, P860, DOI 10.1109/ASONAM.2016.7752338
  • [30] Mulwad V., 2011, 2011 IEEE/WIC/ACM International Joint Conferences on Web Intelligence (WI) and Intelligent Agent Technologies, P257, DOI 10.1109/WI-IAT.2011.26