A tree-based machine learning methodology to automatically classify software vulnerabilities

被引:4
作者
Aivatoglou, Georgios [1 ]
Anastasiadis, Mike [1 ]
Spanos, Georgios [1 ]
Voulgaridis, Antonis [1 ]
Votis, Konstantinos [1 ]
Tzovaras, Dimitrios [1 ]
机构
[1] Informat Technol Inst, Ctr Res & Technol Hellas, Thessaloniki, Greece
来源
PROCEEDINGS OF THE 2021 IEEE INTERNATIONAL CONFERENCE ON CYBER SECURITY AND RESILIENCE (IEEE CSR) | 2021年
基金
欧盟地平线“2020”;
关键词
Software Vulnerability categorization; Cyber-security; Machine Learning; Decision Trees; Random Forests; Gradient Boosting;
D O I
10.1109/CSR51186.2021.9527965
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software vulnerabilities have become a major problem for the security analysts, since the number of new vulnerabilities is constantly growing. Thus, there was a need for a categorization system, in order to group and handle these vulnerabilities in a more efficient way. Hence, the MITRE corporation introduced the Common Weakness Enumeration that is a list of the most common software and hardware vulnerabilities. However, the manual task of understanding and analyzing new vulnerabilities by security experts, is a very slow and exhausting process. For this reason, a new automated classification methodology is introduced in this paper, based on the vulnerability textual descriptions from National Vulnerability Database. The proposed methodology, combines textual analysis and tree-based machine learning techniques in order to classify vulnerabilities automatically. The results of the experiments showed that the proposed methodology performed pretty well achieving an overall accuracy close to 80%.
引用
收藏
页码:312 / 317
页数:6
相关论文
共 20 条
[1]  
Aghaei Ehsan, 2020, ARXIV PREPRINT ARXIV
[2]  
[Anonymous], 2016, P INT C BROADBAND WI
[3]  
[Anonymous], 2021, IEEE Trans. Broadcast.
[4]   Automation of Vulnerability Classification from its Description using Machine Learning [J].
Aota, Masaki ;
Kanehara, Hideaki ;
Kubo, Masaki ;
Murata, Noboru ;
Sun, Bo ;
Takahashi, Takeshi .
2020 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2020, :26-32
[5]  
Breiman L, 1996, MACH LEARN, V24, P123, DOI 10.1023/A:1018054314350
[6]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[7]   Natural language processing [J].
Chowdhury, GG .
ANNUAL REVIEW OF INFORMATION SCIENCE AND TECHNOLOGY, 2003, 37 :51-89
[8]   Explaining Explanations: An Overview of Interpretability of Machine Learning [J].
Gilpin, Leilani H. ;
Bau, David ;
Yuan, Ben Z. ;
Bajwa, Ayesha ;
Specter, Michael ;
Kagal, Lalana .
2018 IEEE 5TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2018, :80-89
[9]   Application of interpretable machine learning models for the intelligent decision [J].
Li, Yawen ;
Yang, Liu ;
Yang, Bohan ;
Wang, Ning ;
Wu, Tian .
NEUROCOMPUTING, 2019, 333 :273-283
[10]  
Liu CH, 2012, LECT NOTES ARTIF INT, V7197, P274, DOI 10.1007/978-3-642-28490-8_29