A Supervised Machine Learning Based Approach for Automatically Extracting High-Level Threat Intelligence from Unstructured Sources

被引:31
作者
Ghazi, Yumna [1 ]
Anwar, Zahid [1 ]
Mumtaz, Rafia [1 ]
Saleem, Shahzad [1 ]
Tahir, Ali [1 ]
机构
[1] NUST, SEECS, Dept Comp, Islamabad, Pakistan
来源
2018 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2018) | 2018年
关键词
Cyber Threat Intelligence; Natural Language Processing; Tactics; Techniques and Procedures (TTPs); STIX; Indicators of Compromise;
D O I
10.1109/FIT.2018.00030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The last few years have seen a radical shift in the cyber defense paradigm from reactive to proactive, and this change is marked by the steadily increasing trend of Cyber Threat Intelligence (CTI) sharing. Currently, there are numerous Open Source Intelligence (OSINT) sources providing periodically updated threat feeds that are fed into various analytical solutions. At this point, there is an excessive amount of data being produced from such sources, both structured (STIX, OpenIOC, etc.) as well as unstructured (blacklists, etc.). However, more often than not, the level of detail required for making informed security decisions is missing from threat feeds, since most indicators are atomic in nature, like IPs and hashes, which are usually rather volatile. These feeds distinctly lack strategic threat information, like attack patterns and techniques that truly represent the behavior of an attacker or an exploit. Moreover, there is a lot of duplication in threat information and no single place where one could explore the entirety of a threat, hence requiring hundreds of man hours for sifting through numerous sources - trying to discern signal from noise - to find all the credible information on a threat. We have made use of natural language processing to extract threat feeds from unstructured cyber threat information sources with approximately 70% precision, providing comprehensive threat reports in standards like STIX, which is a widely accepted industry standard that represents CTI. The automation of an otherwise tedious manual task would ensure the timely gathering and sharing of relevant CTI that would give organizations the edge to be able to proactively defend against known as well as unknown threats.
引用
收藏
页码:129 / 134
页数:6
相关论文
共 15 条
  • [1] [Anonymous], 2012, INFORM SECURITY IS B
  • [2] [Anonymous], 2017, VALUE THREAT INTELLI
  • [3] [Anonymous], 2005, P 43 ANN M ASS COMP
  • [4] Barnum S., 2012, MITRE CORPORATION, V11, P1
  • [5] Bullough A. K., 2017, P 3 ACM INT WORKSH S, P45, DOI DOI 10.1145/3041008.3041009
  • [6] Dubey S., 2018, THREAT RES
  • [7] Cyber Security and the Role of Intelligent Systems in Addressing its Challenges
    Harel, Yaniv
    Ben Gal, Irad
    Elovici, Yuval
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2017, 8 (04)
  • [8] Harrington C., 2013, Sharing indicators of compromise: An overview of standards and formats
  • [9] Institute P., 2018, 3 ANN STUDY CYBER RE
  • [10] Ivanov A. V., 2018, SECURELIST ENGLISH G