An automatic software vulnerability classification framework using term frequency-inverse gravity moment and feature selection

被引:22
作者
Chen, Jinfu [1 ]
Kudjo, Patrick Kwaku [3 ]
Mensah, Solomon [2 ]
Brown, Selasie Aformaley [3 ]
Akorfu, George [4 ]
机构
[1] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[2] Univ Ghana, Dept Comp Sci, Legon, Ghana
[3] Univ Profess Studies, Dept Informat Technol, Accra, Ghana
[4] Wisconsin Int Univ Coll, Dept Business Comp, Accra, Ghana
基金
中国国家自然科学基金;
关键词
Software vulnerability; Classification; Feature selection; Machine learning algorithms; Severity; Term-weighting; VECTOR-SPACE MODEL; WEIGHTING SCHEMES; TEXT CLASSIFICATION; SEVERITY; PREDICTION; ALGORITHM;
D O I
10.1016/j.jss.2020.110616
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Vulnerability classification is an important activity in software development and software quality maintenance. A typical vulnerability classification model usually involves a stage of term selection, in which the relevant terms are identified via feature selection. It also involves a stage of term-weighting, in which the document weights for the selected terms are computed, and a stage for classifier learning. Generally, the term frequency-inverse document frequency (TF-IDF) model is the most widely used term-weighting metric for vulnerability classification. However, several issues hinder the effectiveness of the TF-IDF model for document classification. To address this problem, we propose and evaluate a general framework for vulnerability severity classification using the term frequency-inverse gravity moment (TF-IGM). Specifically, we extensively compare the term frequency-inverse gravity moment, term frequency-inverse document frequency, and information gain feature selection using five machine learning algorithms on ten vulnerable software applications containing a total number of 27,248 security vulnerabilities. The experimental result shows that: (i) the TF-IGM model is a promising term weighting metric for vulnerability classification compared to the classical term-weighting metric, (ii) the effectiveness of feature selection on vulnerability classification varies significantly across the studied datasets and (iii) feature selection improves vulnerability classification. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页数:20
相关论文
共 77 条
[1]  
ABUALIGAH L, 2017, J COMPUT SCI
[2]   Beyond vector space model for hierarchical Arabic text classification: A Markov chain approach [J].
Al-Anzi, Fawaz S. ;
AbuZeina, Dia .
INFORMATION PROCESSING & MANAGEMENT, 2018, 54 (01) :105-115
[3]   Analytical evaluation of term weighting schemes for text categorization [J].
Altincay, Hakan ;
Erenel, Zafer .
PATTERN RECOGNITION LETTERS, 2010, 31 (11) :1310-1323
[4]  
[Anonymous], 2011, International Conference on Database and Expert Systems Applications
[5]  
[Anonymous], 2015, NIPS
[6]  
[Anonymous], 2017, P 21 PAN HELL C INF
[7]  
[Anonymous], 2020, 815 NAT I STAND TECH, DOI [10.6028/NIST.IR.8151, DOI 10.6028/NIST.IR.8151]
[8]   Application of high-dimensional feature selection: evaluation for genomic prediction in man [J].
Bermingham, M. L. ;
Pong-Wong, R. ;
Spiliopoulou, A. ;
Hayward, C. ;
Rudan, I. ;
Campbell, H. ;
Wright, A. F. ;
Wilson, J. F. ;
Agakov, F. ;
Navarro, P. ;
Haley, C. S. .
SCIENTIFIC REPORTS, 2015, 5
[9]  
Bozorgi M., 2010, 16 ACM SIGKDD INT C, P105, DOI DOI 10.1145/1835804.1835821
[10]  
Carver JC, 2010, 1 INT WORKSH REPL EM, P1