Machine Learning Methods for Software Vulnerability Detection

被引:41
|
作者
Chernis, Boris [1 ]
Verma, Rakesh [1 ]
机构
[1] Univ Houston, Houston, TX 77004 USA
来源
IWSPA '18: PROCEEDINGS OF THE FOURTH ACM INTERNATIONAL WORKSHOP ON SECURITY AND PRIVACY ANALYTICS | 2018年
基金
美国国家科学基金会;
关键词
static analysis; buffer overflow; vulnerability detection; n-grams; suffix trees; software metrics; machine learning; PREDICTING FAULTS;
D O I
10.1145/3180445.3180453
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Software vulnerabilities are a primary concern in the IT security industry, as malicious hackers who discover these vulnerabilities can often exploit them for nefarious purposes. However, complex programs, particularly those written in a relatively low-level language like C, are difficult to fully scan for bugs, even when both manual and automated techniques are used. Since analyzing code and making sure it is securely written is proven to be a non-trivial task, both static analysis and dynamic analysis techniques have been heavily investigated, and this work focuses on the former. The contribution of this paper is a demonstration of how it is possible to catch a large percentage of bugs by extracting text features from functions in C source code and analyzing them with a machine learning classifier. Relatively simple features (character count, character diversity, entropy, maximum nesting depth, arrow count, "if" count, "if" complexity, "while" count, and "for" count) were extracted from these functions, and so were complex features (character n-grams, word n-grams, and suffix trees). The simple features performed unexpectedly better compared to the complex features (74% accuracy compared to 69% accuracy).
引用
收藏
页码:31 / 39
页数:9
相关论文
共 50 条
  • [1] Survey of Software Vulnerability Mining Methods Based on Machine Learning
    Li Y.
    Huang C.-L.
    Wang Z.-F.
    Yuan L.
    Wang X.-C.
    Ruan Jian Xue Bao/Journal of Software, 2020, 31 (07): : 2040 - 2061
  • [2] Software Vulnerability Detection: A Comparison of Statistical and Machine Learning Algorithms
    Peerzada, Bareen
    Kumar, Deepak
    INTERNATIONAL JOURNAL OF RELIABILITY QUALITY AND SAFETY ENGINEERING, 2025,
  • [3] Optimizing software vulnerability detection using RoBERTa and machine learning
    Do, Cho Xuan
    Luu, Nguyen Trong
    Nguyen, Phuong Thi Lan
    AUTOMATED SOFTWARE ENGINEERING, 2024, 31 (02)
  • [4] The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches
    Hanif, Hazim
    Nasir, Mohd Hairul Nizam Md
    Ab Razak, Mohd Faizal
    Firdaus, Ahmad
    Anuar, Nor Badrul
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2021, 179
  • [5] A Systematic Literature Review on Automated Software Vulnerability Detection Using Machine Learning
    Harzevili, Nima shiri
    Belle, Alvine boaye
    Wang, Junjie
    Wang, Song
    Jiang, Zhen ming
    Nagappan, Nachiappan
    ACM COMPUTING SURVEYS, 2025, 57 (03)
  • [6] Machine Learning Methods for Improving Vulnerability Detection in Low-level Code
    Letychevskyi, Oleksandr
    Hryniuk, Yaroslav
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5750 - 5752
  • [7] Graph Confident Learning for Software Vulnerability Detection
    Wang, Qian
    Li, Zhengdao
    Liang, Hetong
    Pan, Xiaowei
    Li, Hui
    Li, Tingting
    Li, Xiaochen
    Li, Chenchen
    Guo, Shikai
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 133
  • [8] Code Execution Capability as a Metric for Machine Learning-Assisted Software Vulnerability Detection Models
    Grahn, Daniel
    Chen, Lingwei
    Zhang, Junjie
    2023 IEEE 22ND INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS, TRUSTCOM, BIGDATASE, CSE, EUC, ISCI 2023, 2024, : 1606 - 1613
  • [9] Vul-Mixer: Efficient and Effective Machine Learning-Assisted Software Vulnerability Detection
    Grahn, Daniel
    Chen, Lingwei
    Zhang, Junjie
    ELECTRONICS, 2024, 13 (13)
  • [10] Survey on Software Vulnerability Analysis method based on Machine Learning
    Gong Jie
    Kuang Xiao-hui
    Liu Qiang
    2016 IEEE FIRST INTERNATIONAL CONFERENCE ON DATA SCIENCE IN CYBERSPACE (DSC 2016), 2016, : 642 - 647