VulExplainer: A Transformer-Based Hierarchical Distillation for Explaining Vulnerability Types

被引:13
|
作者
Fu, Michael [1 ]
Nguyen, Van [1 ]
Tantithamthavorn, Chakkrit [1 ]
Le, Trung [1 ]
Phung, Dinh [1 ]
机构
[1] Monash Univ, Fac Informat Technol, Melbourne, Australia
关键词
Software vulnerability; software security; CLASSIFICATION;
D O I
10.1109/TSE.2023.3305244
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning-based vulnerability prediction approaches are proposed to help under-resourced security practitioners to detect vulnerable functions. However, security practitioners still do not know what type of vulnerabilities correspond to a given prediction (aka CWE-ID). Thus, a novel approach to explain the type of vulnerabilities for a given prediction is imperative. In this paper, we propose VulExplainer, an approach to explain the type of vulnerabilities. We represent VulExplainer as a vulnerability classification task. However, vulnerabilities have diverse characteristics (i.e., CWE-IDs) and the number of labeled samples in each CWE-ID is highly imbalanced (known as a highly imbalanced multi-class classification problem), which often lead to inaccurate predictions. Thus, we introduce a Transformer-based hierarchical distillation for software vulnerability classification in order to address the highly imbalanced types of software vulnerabilities. Specifically, we split a complex label distribution into sub-distributions based on CWE abstract types (i.e., categorizations that group similar CWE-IDs). Thus, similar CWE-IDs can be grouped and each group will have a more balanced label distribution. We learn TextCNN teachers on each of the simplified distributions respectively, however, they only perform well in their group. Thus, we build a transformer student model to generalize the performance of TextCNN teachers through our hierarchical knowledge distillation framework. Through an extensive evaluation using the real-world 8,636 vulnerabilities, our approach outperforms all of the baselines by 5%-29%. The results also demonstrate that our approach can be applied to Transformer-based architectures such as CodeBERT, GraphCodeBERT, and CodeGPT. Moreover, our method maintains compatibility with any Transformer-based model without requiring any architectural modifications but only adds a special distillation token to the input. These results highlight our significant contributions towards the fundamental and practical problem of explaining software vulnerability.
引用
收藏
页码:4550 / 4565
页数:16
相关论文
共 50 条
  • [41] A Transformer-Based Contrastive Semi-Supervised Learning Framework for Automatic Modulation Recognition
    Kong, Weisi
    Jiao, Xun
    Xu, Yuhua
    Zhang, Bolin
    Yang, Qinghai
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2023, 9 (04) : 950 - 962
  • [42] SRT: Improved transformer-based model for classification of 2D heartbeat images
    Wu, Wenwen
    Huang, Yanqi
    Wu, Xiaomei
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2024, 88
  • [43] Quantification of river network types based on hierarchical structures
    Li, Minhui
    Wu, Baosheng
    Chen, Yi
    Li, Dan
    CATENA, 2022, 211
  • [44] Analyzing Amazon Products Sentiment: A Comparative Study of Machine and Deep Learning, and Transformer-Based Techniques
    Ali, Hashir
    Hashmi, Ehtesham
    Yildirim, Sule Yayilgan
    Shaikh, Sarang
    ELECTRONICS, 2024, 13 (07)
  • [45] Transformer-based multi-task learning for classification and segmentation of gastrointestinal tract endoscopic images
    Tang, Suigu
    Yu, Xiaoyuan
    Cheang, Chak Fong
    Liang, Yanyan
    Zhao, Penghui
    Yu, Hon Ho
    Choi, I. Cheong
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 157
  • [46] ROMANIAN TOPIC MODELING - AN EVALUATION OF PROBABILISTIC VERSUS TRANSFORMER-BASED TOPIC MODELING FOR DOMAIN CATEGORIZATION
    Nitu, Melania
    Dascalu, Mihai
    Dascalu, Maria-Iuliana
    REVUE ROUMAINE DES SCIENCES TECHNIQUES-SERIE ELECTROTECHNIQUE ET ENERGETIQUE, 2023, 68 (03): : 295 - 300
  • [47] ADF & TransApp: A Transformer-Based Framework for Appliance Detection Using Smart Meter Consumption Series
    Petralia, Adrien
    Charpentier, Philippe
    Palpanas, Themis
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2023, 17 (03): : 553 - 562
  • [48] Exploring the influence of transformer-based multimodal modeling on clinicians' diagnosis of skin diseases: A quantitative analysis
    Zhang, Yujiao
    Hu, Yunfeng
    Li, Ke
    Pan, Xiangjun
    Mo, Xiaoling
    Zhang, Hong
    DIGITAL HEALTH, 2024, 10
  • [49] A Transformer-based fault detection method built on real-time data from microreactors
    Chen, Feiyang
    Zhu, Zhichao
    Qu, Fakun
    Ni, Lei
    Jiang, Juncheng
    Chen, Zhiquan
    CHEMICAL ENGINEERING SCIENCE, 2025, 312
  • [50] Unsupervised Visual Representation Learning Based on Segmentation of Geometric Pseudo-Shapes for Transformer-Based Medical Tasks
    Viriyasaranon, Thanaporn
    Woo, Sang Myung
    Choi, Jang-Hwan
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2023, 27 (04) : 2003 - 2014