CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection

被引:0
|
作者
Wu, Tongshuai [1 ,2 ]
Chen, Liwei [1 ,2 ]
Du, Gewangzi [1 ,2 ]
Zhu, Chenguang [1 ,2 ]
Cui, Ningning [1 ,2 ]
Shi, Gang [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
来源
COMPUTER JOURNAL | 2024年 / 67卷 / 04期
基金
中国国家自然科学基金;
关键词
Data Normalization; Clustering; Vulnerability Detection; Deep Learning;
D O I
10.1093/comjnl/bxad080
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The key to deep learning vulnerability detection framework is pre-processing source code and learning vulnerability features. Traditional source code representation techniques take a complete normalization to user-defined symbols but ignore the semantic information associated with vulnerabilities. The current mainstream vulnerability feature learning model is Recurrent Neural Network (RNN), whose time-series structure determines its insufficient remote information acquisition capability. This paper proposes a new vulnerability detection framework to solve the above problems. We propose a new data normalization method in the source code pre-processing phase. The user-defined symbols are clustered using the unsupervised clustering algorithm K-means. The normalized classification is performed according to the clustering results, which preserves the primary semantic information in the source code and ensures the smoothness of the sample data. In the feature extraction stage, we input the source code after performing text representation into Bidirectional Encoder Representations for Transformers (BERT) for feature automation learning, which enhances semantic information extraction and remote information acquisition. Experimental results show that the vulnerability detection precision of this method is 18.3% higher than that of the current mainstream vulnerability detection framework in the real-world data collected by ourselves. Further, our method improves the precision of the state-of-the-art method by 4.2%.
引用
收藏
页码:1538 / 1549
页数:12
相关论文
共 50 条
  • [1] Clustering-Based Subgroup Detection for Automated Fairness Analysis
    Schaefer, Jero
    Wiese, Lena
    NEW TRENDS IN DATABASE AND INFORMATION SYSTEMS, ADBIS 2022, 2022, 1652 : 45 - 55
  • [2] A Clustering-Based Method for Intrusion Detection in Web Servers
    Pereira, Hermano
    Jamhour, Edgard
    2013 20TH INTERNATIONAL CONFERENCE ON TELECOMMUNICATIONS (ICT), 2013,
  • [3] A Hybrid Unsupervised Clustering-Based Anomaly Detection Method
    Pu, Guo
    Wang, Lijuan
    Shen, Jun
    Dong, Fang
    TSINGHUA SCIENCE AND TECHNOLOGY, 2021, 26 (02) : 146 - 153
  • [4] Data Clustering-based Anomaly Detection in Industrial Control Systems
    Kiss, Istvan
    Genge, Bela
    Haller, Piroska
    Sebestyen, Gheorghe
    2014 IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP), 2014, : 275 - +
  • [5] A clustering-based method for outlier detection under concept drift
    Tahir, Mahjabeen
    Abdullah, Azizol
    Udzir, Nur Izura
    Kasmiran, Khairul Azhar
    MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2024, 43 (03) : 205 - 218
  • [6] A Clustering-Based Method to Anomaly Detection in Thermal Power Plants
    Drapal, Patricia
    Clemente, Jullya
    Reyes, Dailys Maite
    de Souza, Starch Melo
    Lins, Anthony
    Prudencio, Ricardo B. C.
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [7] CLUSTERING-BASED SUBSET ENSEMBLE LEARNING METHOD FOR IMBALANCED DATA
    Hu, Xiao-Sheng
    Zhang, Run-Jing
    PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, : 35 - 39
  • [8] Clustering-based Automated Requirement Trace Retrieval
    Al-walidi, Nejood Hashim
    Azab, Shahira Shaaban
    Khamis, Abdelaziz
    Darwish, Nagy Ramadan
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (12) : 783 - 792
  • [9] Clustering-Based Trajectory Outlier Detection
    Eldawy, Eman O.
    Mokhtar, Hoda M. O.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (05) : 133 - 139
  • [10] Clustering-based KPI Data Association Analysis Method in Cellular Networks
    Guo, Xingyu
    Yu, Peng
    Li, Wenjing
    Qiu, Xuesong
    NOMS 2016 - 2016 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, 2016, : 1101 - 1104