Exploring Transformers for Multi-Label Classification of Java']Java Vulnerabilities

被引：8

作者：

Mamede, Claudia ^{[1
]}

Pinconschi, Eduard ^{[1
]}

Abreu, Rui ^{[1
,2
]}

Campos, Jose ^{[1
,3
]}

机构：

[1] Univ Porto, Fac Engn, Porto, Portugal

[2] INESC ID, Porto, Portugal

[3] Univ Lisbon, LASIGE, Fac Sci, Lisbon, Portugal

来源：

2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS | 2022年

关键词：

Vulnerability detection; transformer; multi-label classification; bias; generalizability;

D O I：

10.1109/QRS57517.2022.00015

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Deep learning (DL) techniques have demonstrated potential in reasoning complex patterns of vulnerable code from high-level abstractions. Recent advancements in the area, such as the introduction of transformer-based models, like BERT, help overcome the problem of the available vulnerability detection datasets being too small to enable most DL models to capture all relevant patterns. They mitigate the challenge by leveraging knowledge from a general domain to solve problems in specific domains. In this paper, we explore different BERT-based models for multi-label classification of vulnerabilities in Java on a synthetic dataset. The models yield up to 99% in accuracy and 94% in f1-score. We remove biases in the training dataset and observe drops of up to 13% of the f1-score. We further assess the generalizability of the models on realistic samples and notice that one model, in particular, predicted unknown vulnerabilities with an f1-score of nearly 85%.

引用

页码：43 / 52

页数：10

共 38 条

[1] Learning curve models and applications: Literature review and research directions
Anzanello, Michel Jose
Fogliatto, Flavio Sanson
[J]. INTERNATIONAL JOURNAL OF INDUSTRIAL ERGONOMICS, 2011, 41 (05) : 573 - 583
[2] Chakraborty Saikat, 2020, DEEP LEARNING BASED, P9
[3] Static analysis for security
Chess, B
McGraw, G
[J]. IEEE SECURITY & PRIVACY, 2004, 2 (06) : 76 - 79
[4] Java']JavaBERT: Training a transformer-based model for the Java']Java programming language
De Sousa, Nelson Tavares
Hasselbring, Wilhelm
[J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2021), 2021, : 90 - 95
[5] Bringing Transparency Design into Practice
Eiband, Malin
Schneider, Hanna
Bilandzic, Mark
Fazekas-Con, Julian
Haug, Mareike
Hussmann, Heinrich
[J]. IUI 2018: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2018, : 211 - 223
[6] Fallah Haytame., 2022, CIRCLE JOINT C INFOR
[7] Feng Z., 2020, Codebert: A PreTrained Model for Programming and Natural Languages, P1536
[8] LineVul: A Transformer-based Line-Level Vulnerability Prediction
Fu, Michael
Tantithamthavorn, Chakkrit
[J]. 2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 608 - 620
[9] Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey
Ghaffarian, Seyed Mohammad
Shahriari, Hamid Reza
[J]. ACM COMPUTING SURVEYS, 2017, 50 (04)
[10] The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches
Hanif, Hazim
Nasir, Mohd Hairul Nizam Md
Ab Razak, Mohd Faizal
Firdaus, Ahmad
Anuar, Nor Badrul
[J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2021, 179

← 1 2 3 4 →