VulBERTa: Simplified Source Code Pre-Training for Vulnerability Detection

被引:0
作者
Hanif, Hazim [1 ,2 ]
Maffeis, Sergio [1 ]
机构
[1] Imperial Coll London, Dept Comp, London, England
[2] Univ Malaya, Fac Comp Sci & Informat Technol, Kuala Lumpur, Malaysia
来源
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2022年
关键词
Vulnerability detection; Software vulnerabilites; Pre-training; Deep learning; Representation learning;
D O I
10.1109/IJCNN55064.2022.9892280
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents VulBERTa, a deep learning approach to detect security vulnerabilities in source code. Our approach pre-trains a RoBERTa model with a custom tokenisation pipeline on real-world code from open-source C/C++ projects. The model learns a deep knowledge representation of the code syntax and semantics, which we leverage to train vulnerability detection classifiers. We evaluate our approach on binary and multi-class vulnerability detection tasks across several datasets (Vuldeepecker, Draper, REVEAL and muVuldeepecker) and benchmarks (CodeXGLUE and D2A). The evaluation results show that VulBERTa achieves state-of-the-art performance and outperforms existing approaches across different datasets, despite its conceptual simplicity, and limited cost in terms of size of training data and number of model parameters.
引用
收藏
页数:8
相关论文
共 40 条
  • [1] Ahmad W., 2021, P 2021 C N AM CHAPT, P2655
  • [2] [Anonymous], 2021, SOFTW ASS REF DAT
  • [3] [Anonymous], 2020, FINDINGS ASS COMPUTA, DOI DOI 10.1109/ICCWAMTIP51612.2020.9317513
  • [4] Buratti L., 2020, ARXIV200612641
  • [5] BGNN4VD: Constructing Bidirectional Graph Neural-Network for Vulnerability Detection
    Cao, Sicong
    Sun, Xiaobing
    Bo, Lili
    Wei, Ying
    Li, Bin
    [J]. INFORMATION AND SOFTWARE TECHNOLOGY, 2021, 136
  • [6] Deep Learning Based Vulnerability Detection: Are We There Yet?
    Chakraborty, Saikat
    Krishna, Rahul
    Ding, Yangruibo
    Ray, Baishakhi
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) : 3280 - 3296
  • [7] DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network
    Cheng, Xiao
    Wang, Haoyu
    Hua, Jiayi
    Xu, Guoai
    Sui, Yulei
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
  • [8] Coimbra David., 2021, USING DISTRIBUTED RE
  • [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [10] Gage Philip, 1994, The C Users Journal, V12, P23, DOI DOI 10.5555/177910.177914