Machine Learning Methods for Software Vulnerability Detection

被引:44
作者
Chernis, Boris [1 ]
Verma, Rakesh [1 ]
机构
[1] Univ Houston, Houston, TX 77004 USA
来源
IWSPA '18: PROCEEDINGS OF THE FOURTH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS | 2018年
基金
美国国家科学基金会;
关键词
static analysis; buffer overflow; vulnerability detection; n-grams; suffix trees; software metrics; machine learning; PREDICTING FAULTS;
D O I
10.1145/3180445.3180453
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software vulnerabilities are a primary concern in the IT security industry, as malicious hackers who discover these vulnerabilities can often exploit them for nefarious purposes. However, complex programs, particularly those written in a relatively low-level language like C, are difficult to fully scan for bugs, even when both manual and automated techniques are used. Since analyzing code and making sure it is securely written is proven to be a non-trivial task, both static analysis and dynamic analysis techniques have been heavily investigated, and this work focuses on the former. The contribution of this paper is a demonstration of how it is possible to catch a large percentage of bugs by extracting text features from functions in C source code and analyzing them with a machine learning classifier. Relatively simple features (character count, character diversity, entropy, maximum nesting depth, arrow count, "if" count, "if" complexity, "while" count, and "for" count) were extracted from these functions, and so were complex features (character n-grams, word n-grams, and suffix trees). The simple features performed unexpectedly better compared to the complex features (74% accuracy compared to 69% accuracy).
引用
收藏
页码:31 / 39
页数:9
相关论文
共 50 条
[21]   An empirical study of text-based machine learning models for vulnerability detection [J].
Kollin Napier ;
Tanmay Bhowmik ;
Shaowei Wang .
Empirical Software Engineering, 2023, 28
[22]   Detection of Intrusions with Machine Learning Methods [J].
Bostanci, Beyzanur ;
Albayrak, Ahmet .
2ND INTERNATIONAL INFORMATICS AND SOFTWARE ENGINEERING CONFERENCE (IISEC), 2021,
[23]   PreNNsem: A Heterogeneous Ensemble Learning Framework for Vulnerability Detection in Software [J].
Wang, Lu ;
Li, Xin ;
Wang, Ruiheng ;
Xin, Yang ;
Gao, Mingcheng ;
Chen, Yulin .
APPLIED SCIENCES-BASEL, 2020, 10 (22) :1-17
[24]   A systematic review of machine learning methods in software testing [J].
Ajorloo, Sedighe ;
Jamarani, Amirhossein ;
Kashfi, Mehdi ;
Kashani, Mostafa Haghi ;
Najafizadeh, Abbas .
APPLIED SOFT COMPUTING, 2024, 162
[25]   Machine Learning for Software Analysis: Models, Methods, and Applications [J].
Bennaceur, Amel ;
Meinke, Karl .
MACHINE LEARNING FOR DYNAMIC SOFTWARE ANALYSIS: POTENTIALS AND LIMITS, 2018, 11026 :3-49
[26]   Detecting Overfitting of Machine Learning Techniques for Automatic Vulnerability Detection [J].
Risse, Niklas .
PROCEEDINGS OF THE 31ST ACM JOINT MEETING EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, ESEC/FSE 2023, 2023, :2189-2191
[27]   Machine-Learning Supported Vulnerability Detection in Source Code [J].
Sonnekalb, Tim .
ESEC/FSE'2019: PROCEEDINGS OF THE 2019 27TH ACM JOINT MEETING ON EUROPEAN SOFTWARE ENGINEERING CONFERENCE AND SYMPOSIUM ON THE FOUNDATIONS OF SOFTWARE ENGINEERING, 2019, :1180-1183
[28]   Dynamic Vulnerability Detection on Smart Contracts Using Machine Learning [J].
Eshghie, Mojtaba ;
Artho, Cyrille ;
Gurov, Dilian .
PROCEEDINGS OF EVALUATION AND ASSESSMENT IN SOFTWARE ENGINEERING (EASE 2021), 2021, :305-312
[29]   Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey [J].
Ghaffarian, Seyed Mohammad ;
Shahriari, Hamid Reza .
ACM COMPUTING SURVEYS, 2017, 50 (04)
[30]   Vulnerability Detection in PHP Web Application Using Lexical Analysis Approach with Machine Learning [J].
Anbiya, Dhika Rizki ;
Purwarianti, Ayu ;
Asnar, Yudistira .
PROCEEDINGS OF 2018 5TH INTERNATIONAL CONFERENCE ON DATA AND SOFTWARE ENGINEERING (ICODSE), 2018,