An Empirical Study of the Imbalance Issue in Software Vulnerability Detection

被引:1
作者
Guo, Yuejun [1 ]
Hu, Qiang [2 ]
Tang, Qiang [1 ]
Le Traon, Yves [2 ]
机构
[1] Luxembourg Inst Sci & Technol, ITIS, Esch Sur Alzette, Luxembourg
[2] Univ Luxembourg, SnT, Luxembourg, Luxembourg
来源
COMPUTER SECURITY - ESORICS 2023, PT IV | 2024年 / 14347卷
关键词
Software security; Vulnerability detection; Deep learning; Imbalance;
D O I
10.1007/978-3-031-51482-1_19
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vulnerability detection is crucial to protect software security. Nowadays, deep learning (DL) is the most promising technique to automate this detection task, leveraging its superior ability to extract patterns and representations within extensive code volumes. Despite its promise, DL-based vulnerability detection remains in its early stages, with model performance exhibiting variability across datasets. Drawing insights from other well-explored application areas like computer vision, we conjecture that the imbalance issue (the number of vulnerable code is extremely small) is at the core of the phenomenon. To validate this, we conduct a comprehensive empirical study involving nine open-source datasets and two state-of-the-art DL models. The results confirm our conjecture. We also obtain insightful findings on how existing imbalance solutions perform in vulnerability detection. It turns out that these solutions perform differently as well across datasets and evaluation metrics. Specifically: 1) Focal loss is more suitable to improve the precision, 2) mean false error and class-balanced loss encourages the recall, and 3) random over-sampling facilitates the F1-measure. However, none of them excels across all metrics. To delve deeper, we explore external influences on these solutions and offer insights for developing new solutions.
引用
收藏
页码:371 / 390
页数:20
相关论文
共 56 条
[1]  
Amankwah R., 2017, Int. J. Comput. Appl., V169, P22
[2]   A Comparison of Open-Source Static Analysis Tools for Vulnerability Detection in C/C plus plus Code [J].
Arusoaie, Andrei ;
Ciobaca, Stefan ;
Craciun, Vlad ;
Gavrilut, Dragos ;
Lucanu, Dorel .
2017 19TH INTERNATIONAL SYMPOSIUM ON SYMBOLIC AND NUMERIC ALGORITHMS FOR SCIENTIFIC COMPUTING (SYNASC 2017), 2017, :161-168
[3]  
Asterisk team, 2022, Asterisk website
[4]  
Bellard F, 2023, Repository of ffmpeg on github
[5]  
Bellard F, 2022, Qemu wesite
[6]  
Bommasani R, 2021, arXiv, DOI DOI 10.48550/ARXIV.2108.07258
[7]  
Brown TB, 2020, ADV NEUR IN, V33
[8]   A systematic study of the class imbalance problem in convolutional neural networks [J].
Buda, Mateusz ;
Maki, Atsuto ;
Mazurowski, Maciej A. .
NEURAL NETWORKS, 2018, 106 :249-259
[9]   Deep Learning Based Vulnerability Detection: Are We There Yet? [J].
Chakraborty, Saikat ;
Krishna, Rahul ;
Ding, Yangruibo ;
Ray, Baishakhi .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) :3280-3296
[10]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)