A machine learning framework for investigating data breaches based on semantic analysis of adversary's attack patterns in threat intelligence repositories

被引:36
作者
Noor, Umara [1 ,5 ]
Anwar, Zahid [2 ,4 ]
Malik, Asad Waqar [3 ]
Khan, Sharifullah [3 ]
Saleem, Shahzad [3 ]
机构
[1] NUST, Informat Technol, Islamabad, Pakistan
[2] NUST, Islamabad, Pakistan
[3] NUST, Sch Elect Engn & Comp Sci, Islamabad, Pakistan
[4] Fontbonne Univ, Math & Comp Sci, St Louis, MO 63105 USA
[5] Int Islamic Univ, Fac Basic & Appl Sci, Dept Comp Sci & Software Engn, Islamabad, Pakistan
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2019年 / 95卷
关键词
Cyber threat intelligence; Data breach investigation; Tactics Techniques and Procedures; Indicators of compromise; Belief network; Latent Semantic Indexing; CYBER; SECURITY;
D O I
10.1016/j.future.2019.01.022
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
With the ever increasing cases of cyber data breaches, the manual process of sifting through tons of security logs to investigate cyber-attacks is error-prone and time-consuming. Signature-based deep search solutions only give accurate results if the threat artifacts are precisely provided. With the burgeoning variety of sophisticated cyber threats having common attack patterns and utilizing the same attack tools, a timely investigation is nearly impossible. There is a need to automate the threat analysis process by mapping adversary's Tactics, Techniques and Procedures (TTPs) to attack goals and detection mechanisms. In this paper, a novel machine learning based framework is proposed that identifies cyber threats based on observed attack patterns. The framework semantically relates threats and TTPs extracted from wellknown threat sources with associated detection mechanisms to form a semantic network. This network is then used to determine threat occurrences by forming probabilistic relationships between threats and TTPs. The framework is trained using a TTP taxonomy dataset and the performance is evaluated with threat artifacts reported in threat reports. The framework efficiently identifies attacks with 92% accuracy and low false positives even in the case of lost and spurious TTPs. The average detection time of a data breach incident is 0.15 s for a network trained with 133 TTPs from 45 threat families. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:467 / 487
页数:21
相关论文
共 67 条
[1]   Identifying cyber threats to mobile-loT applications in edge computing paradigm [J].
Abawajy, Jemal ;
Huda, Shamsul ;
Sharmeen, Shaila ;
Hassan, Mohammad Mehedi ;
Almogren, Ahmad .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2018, 89 :525-538
[2]  
[Anonymous], 2016, Verizon Data Breach Investigations Report
[3]  
[Anonymous], USC ISI ANT DATASETS
[4]  
[Anonymous], ANN WORKSH EC INF SE
[5]  
[Anonymous], COMPUT SECUR
[6]  
[Anonymous], 2013, HIDDEN LYNX PROFESSI
[7]  
[Anonymous], CYB SEC MALW PROT
[8]  
[Anonymous], PREVENT PHISHING ATT
[9]  
[Anonymous], THESIS
[10]  
[Anonymous], POS RAM SCRAPER MALW