Text mining based an automatic model for software vulnerability severity prediction

被引:1
作者
Malhotra, Ruchika [1 ]
Vidushi [1 ]
机构
[1] Delhi Technol Univ, Dept Software Engn, New Delhi, India
关键词
Text mining; Vulnerability severity level; Feature selection; Prediction model; Machine learning; FEATURE-SELECTION;
D O I
10.1007/s13198-024-02371-2
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Software vulnerabilities reported every year increase exponentially, leading to the exploitation of software systems. Hence, when a vulnerability is reported, a requirement arises to patch it as early as possible. Generally, this process requires some time and effort. For proper channelizing of the efforts, a requirement comes to predict the severity of the vulnerability so that the more critical ones can be given a higher priority. Therefore, a need arises to build a model that can analyze the data available on vulnerabilities and predict their severity. The experiment of this study is conducted on vulnerability reports of five software of Mozilla. As the data is textual, text mining techniques are applied to preprocess the data and form feature vectors. This input as text creates very high dimensional feature vectors leading to the requirement of dimensionality reduction. Hence, feature selection is done using chi-square and information gain. To develop the classifier, seven machine learning algorithms are chosen. Hence, fourteen software vulnerability severity prediction models (SVSPM) are developed. The result analysis allowed us to find the best-performing SVSPM. It is concluded that the model performed better for the medium and the critical severity level of the vulnerability. Out of the two feature selection techniques, information gain gave better results. An optimum number of features is also determined at which SVSPM gave good results. The best SVSPM using a machine learning algorithm corresponding to each dataset is found as well. A comparison is also made to identify significant differences among various SVSPMs developed using Friedman and Wilcoxon Signed Rank test.
引用
收藏
页码:3706 / 3724
页数:19
相关论文
共 54 条
[1]   ECG heartbeat arrhythmias classification: a comparison study between different types of spectrum representation and convolutional neural networks architectures [J].
Alqudah, Ali Mohammad ;
Qazan, Shoroq ;
Al-Ebbini, Lina ;
Alquran, Hiam ;
Abu Qasmieh, Isam .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 13 (10) :4877-4907
[2]   Software Metrics and Security Vulnerabilities: Dataset and Exploratory Study [J].
Alves, Henrique ;
Fonseca, Baldoino ;
Antunes, Nuno .
2016 12TH EUROPEAN DEPENDABLE COMPUTING CONFERENCE (EDCC 2016), 2016, :37-44
[3]   Severity Prediction of Software Vulnerabilities based on their Text Description [J].
Babalau, Ion ;
Corlatescu, Dragos ;
Grigorescu, Octavian ;
Sandescu, Cristian ;
Dascalu, Mihai .
2021 23RD INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2021), 2021, :171-177
[4]  
Bilge Leyla., 2012, Proceedings of the 2012 ACM Conference on Computer and Communications Security -- CCS'12, P833, DOI DOI 10.1145/2382196.2382284
[5]  
Blumberg R., 2003, DM Review, V13, P42
[6]   An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection [J].
Chen, Jinfu ;
Kudjo, Patrick Kwaku ;
Mensah, Solomon ;
Brown, Selasie Aformaley ;
Akorfu, George .
JOURNAL OF SYSTEMS AND SOFTWARE, 2020, 167
[7]   Data-Driven Cyber Security in Perspective-Intelligent Traffic Analysis [J].
Coulter, Rory ;
Han, Qing-Long ;
Pan, Lei ;
Zhang, Jun ;
Xiang, Yang .
IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (07) :3081-3093
[8]   Defect Prediction in Android Binary Executables Using Deep Neural Network [J].
Dong, Feng ;
Wang, Junfeng ;
Li, Qi ;
Xu, Guoai ;
Zhang, Shaodong .
WIRELESS PERSONAL COMMUNICATIONS, 2018, 102 (03) :2261-2285
[9]   A comparison of some soft computing methods for software fault prediction [J].
Erturk, Ezgi ;
Sezer, Ebru Akcapinar .
EXPERT SYSTEMS WITH APPLICATIONS, 2015, 42 (04) :1872-1879
[10]  
FIRST org, 2007, COMMON VULNERABILITY