Vulnerability prediction using pre-trained models: An empirical evaluation

被引:1
作者
Kalouptsoglou, Ilias [1 ,2 ]
Siavvas, Miltiadis [1 ]
Ampatzoglou, Apostolos [2 ]
Kehagias, Dionysios [1 ]
Chatzigeorgiou, Alexander [2 ]
机构
[1] Ctr Res & Technol Hellas, Informat Technol Inst, Thessaloniki, Greece
[2] Univ Macedonia, Dept Appl Informat, Thessaloniki, Greece
来源
2024 32ND INTERNATIONAL CONFERENCE ON MODELING, ANALYSIS AND SIMULATION OF COMPUTER AND TELECOMMUNICATION SYSTEMS, MASCOTS 2024 | 2024年
关键词
Software security; Vulnerability prediction; Transfer learning; Large language models; Transformer;
D O I
10.1109/MASCOTS64422.2024.10786510
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The rise of Large Language Models (LLMs) has provided new directions for addressing downstream text classification tasks, such as vulnerability prediction, where segments of the source code are classified as vulnerable or not. Several recent studies have employed transfer learning in order to enhance vulnerability prediction taking advantage of the prior knowledge of the pre-trained LLMs. In the current study, different Transformer-based pre-trained LLMs are examined and evaluated with respect to their capacity to predict vulnerable software components. In particular, we fine-tune BERT, GPT-2, and T5 models, as well as their code-oriented variants namely CodeBERT, CodeGPT, and CodeT5 respectively. Subsequently, we assess their performance and we conduct an empirical comparison between them to identify the models that are the most accurate ones in vulnerability prediction.
引用
收藏
页码:200 / 205
页数:6
相关论文
共 37 条
[1]  
Bagheri A., 2021, Quality of Information and Communications Technology
[2]   Using complexity, coupling, and cohesion metrics as early indicators of vulnerabilities [J].
Chowdhury, Istehad ;
Zulkernine, Mohammad .
JOURNAL OF SYSTEMS ARCHITECTURE, 2011, 57 (03) :294-313
[3]  
Cohen J., 1988, Statistical Power Analysis for the Behavioral Sciences., V2nd, DOI [DOI 10.1007/978-1-4684-5439-0_2, DOI 10.4324/9780203771587, 10.4324/9780203771587]
[4]  
Coimbra D, 2021, Arxiv, DOI arXiv:2106.01367
[5]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805
[6]   FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm [J].
Fang, Yong ;
Liu, Yongcheng ;
Huang, Cheng ;
Liu, Liang .
PLOS ONE, 2020, 15 (02)
[7]  
Fast Text, fastText
[8]  
Feng ZY, 2020, Arxiv, DOI arXiv:2002.08155
[9]  
Filus K., 2020, S MOD AN SIM COMP TE, P102
[10]   LineVul: A Transformer-based Line-Level Vulnerability Prediction [J].
Fu, Michael ;
Tantithamthavorn, Chakkrit .
2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, :608-620