LIVABLE: Exploring Long-Tailed Classification of Software Vulnerability Types

被引:4
|
作者
Wen, Xin-Cheng [1 ]
Gao, Cuiyun [1 ,2 ,3 ]
Luo, Feng [1 ]
Wang, Haoyu [4 ]
Li, Ge [5 ]
Liao, Qing [1 ]
机构
[1] Harbin Inst Technol, Shenzhen 518055, Peoples R China
[2] Peng Cheng Lab, Shenzhen 518055, Peoples R China
[3] Guangdong Prov Key Lab Novel Secur Intelligence Te, Shenzhen 518055, Peoples R China
[4] Huazhong Univ Sci & Technol, Wuhan 430074, Peoples R China
[5] Peking Univ, Beijing 100871, Peoples R China
关键词
Tail; Codes; Software; Representation learning; Training; Source coding; Graph neural networks; Software vulnerability; deep learning; graph neural network;
D O I
10.1109/TSE.2024.3382361
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Prior studies generally focus on software vulnerability detection and have demonstrated the effectiveness of Graph Neural Network (GNN)-based approaches for the task. Considering the various types of software vulnerabilities and the associated different degrees of severity, it is also beneficial to determine the type of each vulnerable code for developers. In this paper, we observe that the distribution of vulnerability type is long-tailed in practice, where a small portion of classes have massive samples (i.e., head classes) but the others contain only a few samples (i.e., tail classes). Directly adopting previous vulnerability detection approaches tends to result in poor detection performance, mainly due to two reasons. First, it is difficult to effectively learn the vulnerability representation due to the over-smoothing issue of GNNs. Second, vulnerability types in tails are hard to be predicted due to the extremely few associated samples. To alleviate these issues, we propose a Long-taIled software VulnerABiLity typE classification approach, called LIVABLE. LIVABLE mainly consists of two modules, including (1) vulnerability representation learning module, which improves the propagation steps in GNN to distinguish node representations by a differentiated propagation method. A sequence-to-sequence model is also involved to enhance the vulnerability representations. (2) adaptive re-weighting module, which adjusts the learning weights for different types according to the training epochs and numbers of associated samples by a novel training loss. We verify the effectiveness of LIVABLE in both type classification and vulnerability detection tasks. For vulnerability type classification, the experiments on the Fan et al. dataset show that LIVABLE outperforms the state-of-the-art methods by 24.18% in terms of the accuracy metric, and also improves the performance in predicting tail classes by 7.7%. To evaluate the efficacy of the vulnerability representation learning module in LIVABLE, we further compare it with the recent vulnerability detection approaches on three benchmark datasets, which shows that the proposed representation learning module improves the best baselines by 4.03% on average in terms of accuracy.
引用
收藏
页码:1325 / 1339
页数:15
相关论文
共 50 条
  • [1] Feature Distribution Representation Learning Based on Knowledge Transfer for Long-Tailed Classification
    Ma, Yanbiao
    Jiao, Licheng
    Liu, Fang
    Yang, Shuyuan
    Liu, Xu
    Chen, Puhua
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 2772 - 2784
  • [2] Balanced Classification: A Unified Framework for Long-Tailed Object Detection
    Qi, Tianhao
    Xie, Hongtao
    Li, Pandeng
    Ge, Jiannan
    Zhang, Yongdong
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 3088 - 3101
  • [3] ResLT: Residual Learning for Long-Tailed Recognition
    Cui, Jiequan
    Liu, Shu
    Tian, Zhuotao
    Zhong, Zhisheng
    Jia, Jiaya
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3695 - 3706
  • [4] Instance-Specific Semantic Augmentation for Long-Tailed Image Classification
    Chen, Jiahao
    Su, Bing
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 : 2544 - 2557
  • [5] Alleviating ASR Long-Tailed Problem by Decoupling the Learning of Representation and Classification
    Deng, Keqi
    Cheng, Gaofeng
    Yang, Runyan
    Yan, Yonghong
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2022, 30 : 340 - 354
  • [6] Improving long-tailed classification by disentangled variance transfer
    Tian, Yingjie
    Gao, Weizhi
    Zhang, Qin
    Sun, Pu
    Xu, Dongkuan
    INTERNET OF THINGS, 2023, 21
  • [7] Tackling Long-Tailed Distribution Issue in Graph Neural Networks via Normalization
    Liang, Langzhang
    Xu, Zenglin
    Song, Zixing
    King, Irwin
    Qi, Yuan
    Ye, Jieping
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (05) : 2213 - 2223
  • [8] MinoritySalMix and adaptive semantic weight compensation for long-tailed classification
    Zeng, Wu
    Xiao, Zheng-ying
    IMAGE AND VISION COMPUTING, 2024, 152
  • [9] Joint representation and classifier learning for long-tailed image classification
    Guan, Qingji
    Li, Zhuangzhuang
    Zhang, Jiayu
    Huang, Yaping
    Zhao, Yao
    IMAGE AND VISION COMPUTING, 2023, 137
  • [10] Long-Tailed Classification Based on Coarse-Grained Leading Forest and Multi-Center Loss
    Yang, Jinye
    Xu, Ji
    Wu, Di
    Tang, Jianhang
    Li, Shaobo
    Wang, Guoyin
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,