SbrPBert: A BERT-Based Model for Accurate Security Bug Report Prediction

被引:1
作者
Cao, Xudong [1 ]
liu, Tianwei [2 ]
Zhang, Jianyuan [3 ]
Feng, Mengyue [1 ]
Zhang, Xin [4 ]
Cao, Wanying [1 ]
Sun, Hongyu [2 ]
Zhang, Yuqing [1 ]
机构
[1] Univ Chinese Acad Sci, Natl Comp Network Intrus Protect Ctr, Beijing, Peoples R China
[2] Xidian Univ, Sch Cyber Engn, Xian, Peoples R China
[3] Lanzhou Univ Technol, Sch Comp & Commun, Lanzhou, Peoples R China
[4] Sch Cyberspace Secur, Xian Univ Posts & Telecommun, Xian, Peoples R China
来源
52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS WORKSHOP VOLUME (DSN-W 2022) | 2022年
基金
中国国家自然科学基金;
关键词
deep learning; Bert; security bug report; vulnerability;
D O I
10.1109/DSN-W54100.2022.00030
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Bidirectional Encoder Representation from Transformers (Bert) has achieved impressive performance in several Natural Language Processing (NLP) tasks. However, there has been limited investigation on its adaptation guidelines in specialized fields. Here we focus on the software security domain. Early identification of security-related reports in software bug reports is one of the essential means to prevent security accidents. However, the prediction of security bug reports (SBRs) is limited by the scarcity and imbalance of samples in this field and the complex characteristics of SBRs. So motivated, we constructed the largest dataset in this field and proposed a Security Bug Report Prediction Model Based on Bert (SbrPBert). By introducing a layer-based learning rate attenuation strategy and a fine-tuning method for freezing some layers, our model outperforms the baseline model on both our dataset and other small-sample datasets. This means the practical value of the model in BUG tracking systems or projects that lack samples. Moreover, our model has detected 56 hidden vulnerabilities through deployment on the Mozilla and RedHat projects so far.
引用
收藏
页码:129 / 134
页数:6
相关论文
共 50 条
  • [31] BERT-based Regression Model for Micro-edit Humor Classification Task
    Chen, Yuancheng
    Hou, Yi
    Ye, Deqiang
    Yu, Yuehang
    2021 INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, INFORMATION AND COMMUNICATION ENGINEERING, 2021, 11933
  • [32] Detecting the Impact of COVID-19 on Social Media using BERT-Based Model
    Albashayreh, Amjad
    Najadat, Hassan
    2024 15TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS, ICICS 2024, 2024,
  • [33] BERT-based Semantic Model for Rescoring N-best Speech Recognition List
    Fohr, Dominique
    Illina, Irina
    INTERSPEECH 2021, 2021, : 1867 - 1871
  • [34] IMPROVING END-TO-END SPEECH TRANSLATION MODEL WITH BERT-BASED CONTEXTUAL INFORMATION
    Bang, Jeong-Uk
    Lee, Min-Kyu
    Yun, Seung
    Kim, Sang-Hun
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 6227 - 6231
  • [35] LBKT: A LSTM BERT-Based Knowledge Tracing Model for Long-Sequence Data
    Li, Zhaoxing
    Yang, Jujie
    Wang, Jindi
    Shi, Lei
    Feng, Jiayi
    Stein, Sebastian
    GENERATIVE INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, PT II, ITS 2024, 2024, 14799 : 174 - 184
  • [36] TABLE: A Task-Adaptive BERT-based ListwisE Ranking Model for Document Retrieval
    Sun, Xingwu
    Tang, Hongyin
    Zhang, Fuzheng
    Cui, Yanling
    Jin, Beihong
    Wang, Zhongyuan
    CIKM '20: PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, 2020, : 2233 - 2236
  • [37] An Improved DE Algorithm to Optimise the Learning Process of a BERT-based Plagiarism Detection Model
    Moravvej, Seyed Vahid
    Mousavirad, Seyed Jalaleddin
    Oliva, Diego
    Schaefer, Gerald
    Sobhaninia, Zahra
    2022 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2022,
  • [38] A BERT-Based Semantic Enhanced Model for COVID-19 Fake News Detection
    Yin, Hui
    Liu, Xiao
    Wu, Yutao
    Aria, Hilya Mudrika
    Mohawesh, Rami
    WEB AND BIG DATA, PT I, APWEB-WAIM 2023, 2024, 14331 : 1 - 15
  • [39] GovBERT-BR: A BERT-Based Language Model for Brazilian Portuguese Governmental Data
    Silva, Mariana O.
    Oliveira, Gabriel P.
    Costa, Lucas G. L.
    Pappa, Gisele L.
    INTELLIGENT SYSTEMS, BRACIS 2024, PT II, 2025, 15413 : 19 - 32
  • [40] A BERT-based pretraining model for extracting molecular structural information from a SMILES sequence
    Zheng, Xiaofan
    Tomiura, Yoichi
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01):