Combining Software Metrics and Text Features for Vulnerable File Prediction

被引:35
作者
Zhang, Yun [1 ]
Lo, David [2 ]
Xia, Xin [1 ]
Xu, Bowen [1 ]
Sun, Jianling [1 ]
Li, Shanping [1 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[2] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore
来源
2015 20TH INTERNATIONAL CONFERENCE ON ENGINEERING OF COMPLEX COMPUTER SYSTEMS (ICECCS) | 2015年
关键词
Vulnerable File; Machine Learning; Text Mining; VALIDATION;
D O I
10.1109/ICECCS.2015.15
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, to help developers reduce time and effort required to build highly secure software, a number of prediction models which are built on different kinds of features have been proposed to identify vulnerable source code files. In this paper, we propose a novel approach VULPREDICTOR to predict vulnerable files; it analyzes software metrics and text mining together to build a composite prediction model. VULPREDICTOR first builds 6 underlying classifiers on a training set of vulnerable and non-vulnerable files represented by their software metrics and text features, and then constructs a meta classifier to process the outputs of the 6 underlying classifiers. We evaluate our solution on datasets from three web applications including Drupal, PHPMyAdmin and Moodle which contain a total of 3,466 files and 223 vulnerabilities. The experiment results show that VULPREDICTOR can achieve F1 and EffectivenessRatio@20% scores of up to 0.683 and 75%, respectively. On average across the 3 projects, VULPREDICTOR improves the F1 and EffectivenessRatio@20% scores of the best performing state-of-the-art approaches proposed by Walden et al. by 46.53% and 14.93%, respectively.
引用
收藏
页码:40 / 49
页数:10
相关论文
共 40 条
  • [1] Alhazmi O, 2005, LECT NOTES COMPUT SC, V3654, P281
  • [2] [Anonymous], EUSAR 2014 10 EUR C
  • [3] [Anonymous], 2014, C4. 5: programs for machine learning
  • [4] [Anonymous], AUTOMATED SOFTWARE E
  • [5] [Anonymous], 2011, Pei. data mining concepts and techniques
  • [6] Antoniol G, 2008, P 2008 C CTR ADV STU, DOI [10.1145/1463788.1463819, DOI 10.1145/1463788.1463819]
  • [7] Data mining techniques for building fault-proneness models in telecom Java']Java softwarea
    Arisholm, Erik
    Biland, Lionel C.
    Fuglerud, Magnus
    [J]. ISSRE 2007: 18TH IEEE INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING, PROCEEDINGS, 2007, : 215 - +
  • [8] A validation of object-oriented design metrics as quality indicators
    Basili, VR
    Briand, LC
    Melo, WL
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (10) : 751 - 761
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Software defect identification using machine learning techniques
    Ceylan, Evren
    Kudubay, F. Onur
    Bener, Ayse B.
    [J]. 32ND EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA) - PROCEEDINGS, 2006, : 240 - +