CDNM: Clustering-Based Data Normalization Method For Automated Vulnerability Detection

被引:0
作者
Wu, Tongshuai [1 ,2 ]
Chen, Liwei [1 ,2 ]
Du, Gewangzi [1 ,2 ]
Zhu, Chenguang [1 ,2 ]
Cui, Ningning [1 ,2 ]
Shi, Gang [1 ,2 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Data Normalization; Clustering; Vulnerability Detection; Deep Learning;
D O I
10.1093/comjnl/bxad080
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The key to deep learning vulnerability detection framework is pre-processing source code and learning vulnerability features. Traditional source code representation techniques take a complete normalization to user-defined symbols but ignore the semantic information associated with vulnerabilities. The current mainstream vulnerability feature learning model is Recurrent Neural Network (RNN), whose time-series structure determines its insufficient remote information acquisition capability. This paper proposes a new vulnerability detection framework to solve the above problems. We propose a new data normalization method in the source code pre-processing phase. The user-defined symbols are clustered using the unsupervised clustering algorithm K-means. The normalized classification is performed according to the clustering results, which preserves the primary semantic information in the source code and ensures the smoothness of the sample data. In the feature extraction stage, we input the source code after performing text representation into Bidirectional Encoder Representations for Transformers (BERT) for feature automation learning, which enhances semantic information extraction and remote information acquisition. Experimental results show that the vulnerability detection precision of this method is 18.3% higher than that of the current mainstream vulnerability detection framework in the real-world data collected by ourselves. Further, our method improves the precision of the state-of-the-art method by 4.2%.
引用
收藏
页码:1538 / 1549
页数:12
相关论文
共 50 条
  • [31] A Clustering-Based Method for Team Formation in Learning Environments
    Guijarro-Mata-Garcia, Marta
    Guijarro, Maria
    Fuentes-Fernandez, Ruben
    Hybrid Artificial Intelligent Systems, 2016, 9648 : 475 - 486
  • [32] Clustering-Based Semantic Web Service Matchmaking with Automated Knowledge Acquisition
    Liu, Peng
    Zhang, Jingyu
    Yu, Xueli
    WEB INFORMATION SYSTEMS AND MINING, PROCEEDINGS, 2009, 5854 : 261 - +
  • [33] Ensemble Classification for Anomalous Propagation Echo Detection with Clustering-Based Subset-Selection Method
    Lee, Hansoo
    Kim, Sungshin
    ATMOSPHERE, 2017, 8 (01):
  • [34] Clustering-based Shadow Edge Detection in a Single Color Image
    Wang Shiting
    Zheng Hong
    PROCEEDINGS 2013 INTERNATIONAL CONFERENCE ON MECHATRONIC SCIENCES, ELECTRIC ENGINEERING AND COMPUTER (MEC), 2013, : 1038 - 1041
  • [35] Object detection by clustering-based nonparametric kernel density estimation
    Hu, D.
    Hu, J.
    INFORMATION SCIENCE AND MANAGEMENT ENGINEERING, VOLS 1-3, 2014, 46 : 1867 - 1872
  • [36] Clustering-based hybrid resampling techniques for social lending data
    Jadwal P.K.
    Jain S.
    Agarwal B.
    International Journal of Intelligent Systems Technologies and Applications, 2021, 20 (03) : 183 - 198
  • [37] Clustering-Based Predictive Analytics to Improve Scientific Data Discovery
    Devarakonda, Ranjeet
    Kumar, Jitendra
    Prakash, Giri
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 5658 - 5661
  • [38] Clustering-Based Hybrid Approach for Multivariate Missing Data Imputation
    Dubey, Aditya
    Rasool, Akhtar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (11) : 710 - 714
  • [39] GOAL: a clustering-based method for the group optimal location problem
    Chen, Fangshu
    Qi, Jianzhong
    Lin, Huaizhong
    Gao, Yunjun
    Lu, Dongming
    KNOWLEDGE AND INFORMATION SYSTEMS, 2019, 61 (02) : 873 - 903
  • [40] Clustering-based recommendation method with enhanced grasshopper optimisation algorithm
    Zhao, Zihao
    Xia, Yingchun
    Xu, Wenjun
    Yu, Hui
    Yang, Shuai
    Chen, Cheng
    Yuan, Xiaohui
    Zhou, Xiaobo
    Wang, Qingyong
    Gu, Lichuan
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2025,