Learning to Predict Severity of Software Vulnerability Using Only Vulnerability Description

被引:129
作者
Han, Zhuobing [1 ]
Li, Xiaohong [1 ]
Xing, Zhenchang [2 ]
Liu, Hongtao [1 ]
Feng, Zhiyong [3 ]
机构
[1] Tianjin Univ, Sch Comp Sci & Technol, Tianjin Key Lab Adv Networking TANK, Tianjin, Peoples R China
[2] Australian Natl Univ, Res Sch Comp Sci, Canberra, ACT, Australia
[3] Tianjin Univ, Sch Comp Software, Tianjin, Peoples R China
来源
2017 IEEE INTERNATIONAL CONFERENCE ON SOFTWARE MAINTENANCE AND EVOLUTION (ICSME) | 2017年
基金
美国国家科学基金会;
关键词
vulnerability severity prediction; multi-class classification; deep learning; mining software repositories;
D O I
10.1109/ICSME.2017.52
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software vulnerabilities pose significant security risks to the host computing system. Faced with continuous disclosure of software vulnerabilities, system administrators must prioritize their efforts, triaging the most critical vulnerabilities to address first. Many vulnerability scoring systems have been proposed, but they all require expert knowledge to determine intricate vulnerability metrics. In this paper, we propose a deep learning approach to predict multi-class severity level of software vulnerability using only vulnerability description. Compared with intricate vulnerability metrics, vulnerability description is the "surface level" information about how a vulnerability works. To exploit vulnerability description for predicting vulnerability severity, discriminative features of vulnerability description have to be defined. This is a challenging task due to the diversity of software vulnerabilities and the richness of vulnerability descriptions. Instead of relying on manual feature engineering, our approach uses word embeddings and a one-layer shallow Convolutional Neural Network (CNN) to automatically capture discriminative word and sentence features of vulnerability descriptions for predicting vulnerability severity. We exploit large amounts of vulnerability data from the Common Vulnerabilities and Exposures (CVE) database to train and test our approach.
引用
收藏
页码:125 / 136
页数:12
相关论文
共 56 条
[31]  
Mou LL, 2016, AAAI CONF ARTIF INTE, P1287
[32]   Improving Document Ranking with Dual Word Embeddings [J].
Nalisnick, Eric ;
Mitra, Bhaskar ;
Craswell, Nick ;
Caruana, Rich .
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16 COMPANION), 2016, :83-84
[33]  
Neuhaus S, 2007, CCS'07: PROCEEDINGS OF THE 14TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, P529
[34]  
NIST, NAT VULN DAT HOM
[35]  
Ororbia I. I., 2017, ARXIV170308864
[36]  
Reed S., 2015, ICLR
[37]   VECTOR-SPACE MODEL FOR AUTOMATIC INDEXING [J].
SALTON, G ;
WONG, A ;
YANG, CS .
COMMUNICATIONS OF THE ACM, 1975, 18 (11) :613-620
[38]   Introduction to Information Retrieval [J].
Sanderson, Mark .
NATURAL LANGUAGE ENGINEERING, 2010, 16 :100-103
[39]   Predicting Vulnerable Software Components via Text Mining [J].
Scandariato, Riccardo ;
Walden, James ;
Hovsepyan, Aram ;
Joosen, Wouter .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2014, 40 (10) :993-1006
[40]   Learning to Rank Short Text Pairs with Convolutional Deep Neural Networks [J].
Severyn, Aliaksei ;
Moschitti, Alessandro .
SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, :373-382