Exploring Transformers for Multi-Label Classification of Java']Java Vulnerabilities

被引:8
作者
Mamede, Claudia [1 ]
Pinconschi, Eduard [1 ]
Abreu, Rui [1 ,2 ]
Campos, Jose [1 ,3 ]
机构
[1] Univ Porto, Fac Engn, Porto, Portugal
[2] INESC ID, Porto, Portugal
[3] Univ Lisbon, LASIGE, Fac Sci, Lisbon, Portugal
来源
2022 IEEE 22ND INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY, QRS | 2022年
关键词
Vulnerability detection; transformer; multi-label classification; bias; generalizability;
D O I
10.1109/QRS57517.2022.00015
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning (DL) techniques have demonstrated potential in reasoning complex patterns of vulnerable code from high-level abstractions. Recent advancements in the area, such as the introduction of transformer-based models, like BERT, help overcome the problem of the available vulnerability detection datasets being too small to enable most DL models to capture all relevant patterns. They mitigate the challenge by leveraging knowledge from a general domain to solve problems in specific domains. In this paper, we explore different BERT-based models for multi-label classification of vulnerabilities in Java on a synthetic dataset. The models yield up to 99% in accuracy and 94% in f1-score. We remove biases in the training dataset and observe drops of up to 13% of the f1-score. We further assess the generalizability of the models on realistic samples and notice that one model, in particular, predicted unknown vulnerabilities with an f1-score of nearly 85%.
引用
收藏
页码:43 / 52
页数:10
相关论文
共 38 条
  • [1] Learning curve models and applications: Literature review and research directions
    Anzanello, Michel Jose
    Fogliatto, Flavio Sanson
    [J]. INTERNATIONAL JOURNAL OF INDUSTRIAL ERGONOMICS, 2011, 41 (05) : 573 - 583
  • [2] Chakraborty Saikat, 2020, DEEP LEARNING BASED, P9
  • [3] Static analysis for security
    Chess, B
    McGraw, G
    [J]. IEEE SECURITY & PRIVACY, 2004, 2 (06) : 76 - 79
  • [4] Java']JavaBERT: Training a transformer-based model for the Java']Java programming language
    De Sousa, Nelson Tavares
    Hasselbring, Wilhelm
    [J]. 2021 36TH IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING WORKSHOPS (ASEW 2021), 2021, : 90 - 95
  • [5] Bringing Transparency Design into Practice
    Eiband, Malin
    Schneider, Hanna
    Bilandzic, Mark
    Fazekas-Con, Julian
    Haug, Mareike
    Hussmann, Heinrich
    [J]. IUI 2018: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2018, : 211 - 223
  • [6] Fallah Haytame., 2022, CIRCLE JOINT C INFOR
  • [7] Feng Z., 2020, Codebert: A PreTrained Model for Programming and Natural Languages, P1536
  • [8] LineVul: A Transformer-based Line-Level Vulnerability Prediction
    Fu, Michael
    Tantithamthavorn, Chakkrit
    [J]. 2022 MINING SOFTWARE REPOSITORIES CONFERENCE (MSR 2022), 2022, : 608 - 620
  • [9] Software Vulnerability Analysis and Discovery Using Machine-Learning and Data-Mining Techniques: A Survey
    Ghaffarian, Seyed Mohammad
    Shahriari, Hamid Reza
    [J]. ACM COMPUTING SURVEYS, 2017, 50 (04)
  • [10] The rise of software vulnerability: Taxonomy of software vulnerabilities detection and machine learning approaches
    Hanif, Hazim
    Nasir, Mohd Hairul Nizam Md
    Ab Razak, Mohd Faizal
    Firdaus, Ahmad
    Anuar, Nor Badrul
    [J]. JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2021, 179