The Importance of Accounting for Real-World Labelling When Predicting Software Vulnerabilities

被引：55

作者：

Jimenez, Matthieu ^{[1
]}

Rwemalika, Renaud ^{[1
]}

Papadakis, Mike ^{[1
]}

Sarro, Federica ^{[2
]}

Le Traon, Yves ^{[1
]}

Harman, Mark ^{[2
,3
]}

机构：

[1] Univ Luxembourg, Esch Sur Alzette, Luxembourg

[2] UCL, London, England

[3] Facebook, London, England

来源：

ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING | 2019年

关键词：

Software Vulnerabilities; Machine Learning; Prediction Modelling;

D O I：

10.1145/3338906.3338941

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Previous work on vulnerability prediction assume that predictive models are trained with respect to perfect labelling information (includes labels from future, as yet undiscovered vulnerabilities). In this paper we present results from a comprehensive empirical study of 1,898 real-world vulnerabilities reported in 74 releases of three security-critical open source systems (Linux Kernel, OpenSSL and Wiresark). Our study investigates the effectiveness of three previously proposed vulnerability prediction approaches, in two settings: with and without the unrealistic labelling assumption. The results reveal that the unrealistic labelling assumption can profoundly mislead the scientific conclusions drawn; suggesting highly effective and deployable prediction results vanish when we fully account for realistically available labelling in the experimental methodology. More precisely, MCC mean values of predictive effectiveness drop from 0.77, 0.65 and 0.43 to 0.08, 0.22, 0.10 for Linux Kernel, OpenSSL and Wiresark, respectively. Similar results are also obtained for precision, recall and other assessments of predictive efficacy. The community therefore needs to upgrade experimental and empirical methodology for vulnerability prediction evaluation and development to ensure robust and actionable scientific findings.

引用

页码：695 / 705

页数：11

共 41 条

[1] Is "Better Data" Better Than "Better Data Miners"? On the Benefits of Tuning SMOTE for Defect Prediction [J].

Agrawal, Amritanshu ;

Menzies, Tim .

PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2018, :1050-1061

[2]

[Anonymous], 2010, P 6 INT C PRED MOD S

[3]

Bowes David., 2016, Proceedings of the 25th International Symposium on Software Testing and Analysis, P330, DOI [DOI 10.1145/2931037.2931039, 10.1145/2931037.2931039]

[4] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[5] Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities [J].

Chowdhury, Istehad ;

Zulkernine, Mohammad .

JOURNAL OF SYSTEMS ARCHITECTURE, 2011, 57 (03) :294-313

[6]

Chowdhury M., 2010, P 2010 ACM S APPL CO, P1963

[7]

Di Martino Sergio, 2011, Product-Focused Software Process Improvement. Proceedings 12th International Conference, PROFES 2011, P247, DOI 10.1007/978-3-642-21843-9_20

[8] A Few Useful Things to Know About Machine Learning [J].

Domingos, Pedro .

COMMUNICATIONS OF THE ACM, 2012, 55 (10) :78-87

[9]

Federica Sarro, 2019, P 11 INT S SEARCH BA

[10]

Ferrucci F., 2014, Softw. Proj. Manag. Chang. World, P373

← 1 2 3 4 5 →