CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection

被引:0
|
作者
Wu, Tongshuai [1 ,2 ]
Chen, Liwei [1 ,2 ]
Du, Gewangzi [1 ,2 ]
Zhu, Chenguang [1 ,2 ]
Cui, Ningning [1 ,2 ]
Shi, Gang [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 04期
基金
中国国家自然科学基金;
关键词
Data Normalization; Clustering; Vulnerability Detection; Deep Learning;
D O I
10.1093/comjnl/bxad080
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The key to deep learning vulnerability detection framework is pre-processing source code and learning vulnerability features. Traditional source code representation techniques take a complete normalization to user-defined symbols but ignore the semantic information associated with vulnerabilities. The current mainstream vulnerability feature learning model is Recurrent Neural Network (RNN), whose time-series structure determines its insufficient remote information acquisition capability. This paper proposes a new vulnerability detection framework to solve the above problems. We propose a new data normalization method in the source code pre-processing phase. The user-defined symbols are clustered using the unsupervised clustering algorithm K-means. The normalized classification is performed according to the clustering results, which preserves the primary semantic information in the source code and ensures the smoothness of the sample data. In the feature extraction stage, we input the source code after performing text representation into Bidirectional Encoder Representations for Transformers (BERT) for feature automation learning, which enhances semantic information extraction and remote information acquisition. Experimental results show that the vulnerability detection precision of this method is 18.3% higher than that of the current mainstream vulnerability detection framework in the real-world data collected by ourselves. Further, our method improves the precision of the state-of-the-art method by 4.2%.
引用
收藏
页码:1538 / 1549
页数:12
相关论文
共 50 条
  • [21] Contrastive Clustering-Based Patient Normalization to Improve Automated In Vivo Oral Cancer Diagnosis from Multispectral Autofluorescence Lifetime Images
    Caughlin, Kayla
    Duran-Sierra, Elvis
    Cheng, Shuna
    Cuenca, Rodrigo
    Ahmed, Beena
    Ji, Jim
    Martinez, Mathias
    Al-Khalil, Moustafa
    Al-Enazi, Hussain
    Jo, Javier A.
    Busso, Carlos
    CANCERS, 2024, 16 (23)
  • [22] A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection
    Yang, Chaofan
    Liu, Guanjun
    Yan, Chungang
    Jiang, Changjun
    SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (12)
  • [23] A clustering-based flexible weighting method in AdaBoost and its application to transaction fraud detection
    Chaofan Yang
    Guanjun Liu
    Chungang Yan
    Changjun Jiang
    Science China Information Sciences, 2021, 64
  • [24] Clustering-based visualizations for diagnosing diseases on metagenomic data
    Nguyen, Hai Thanh
    Phan, Trang Huyen
    Pham, Linh Thuy Thi
    Pham, Ngoc Huynh
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (8-9) : 5685 - 5699
  • [25] Clustering-based incremental learning for imbalanced data classification
    Liu, Yuxin
    Du, Guangyu
    Yin, Chenke
    Zhang, Haichao
    Wang, Jia
    KNOWLEDGE-BASED SYSTEMS, 2024, 292
  • [26] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [27] A Clustering-Based Data Reduction for the Large Automotive Datasets
    Siwek, Patryk
    Skruch, Pawel
    Dlugosz, Marek
    2023 27TH INTERNATIONAL CONFERENCE ON METHODS AND MODELS IN AUTOMATION AND ROBOTICS, MMAR, 2023, : 234 - 239
  • [28] Clustering-based analysis for residential district heating data
    Gianniou, Panagiota
    Liu, Xiufeng
    Heller, Alfred
    Nielsen, Per Sieverts
    Rode, Carsten
    ENERGY CONVERSION AND MANAGEMENT, 2018, 165 : 840 - 850
  • [29] Clustering-Based Semantic Web Service Matchmaking with Automated Knowledge Acquisition
    Liu, Peng
    Zhang, Jingyu
    Yu, Xueli
    WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 261 - +
  • [30] A clustering-based method to detect functional connectivity differences
    Chen, Gang
    Ward, B. Douglas
    Xie, Chunming
    Li, Wenjun
    Chen, Guangyu
    Goveas, Joseph S.
    Antuono, Piero G.
    Li, Shi-Jiang
    NEUROIMAGE, 2012, 61 (01) : 56 - 61