A cosine similarity-based labeling technique for vulnerability type detection using source codes

被引:1
作者
Ozturk, M. Maruf [1 ]
机构
[1] Suleyman Demirel Univ, Engn & Nat Sci Fac, Dept Comp Engn, Isparta, Turkiye
关键词
Vulnerability detection; Cosine similarity; Generalized linear model; Labeling; Text encoding;
D O I
10.1016/j.cose.2024.104059
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Vulnerability detection is of great importance in providing reliability to software systems. Although existing methods achieve remarkable success in vulnerability detection, they have several disadvantages as follows: (1) The irrelevant information is removed from source codes, which have a high noise ratio, thereby utilizing deep learning methods and devising experiments featuring high accuracy. However, deep learning-based detection methods necessitate large-scale datasets. This results in computational hardship with respect to vulnerability detection in small-scale software systems. (2) The majority of the studies perform feature selection by processing vulnerability commits. Despite tremendous endeavors, there are few works detecting vulnerability with source codes. To solve these two problems, in this study, a novel labeling and vulnerability detection algorithm is proposed. The algorithm first exploits source codes with the help of a keyword vulnerability matrix. After that, an ultimate encoded matrix is generated by word2vec, thereby combining the labeling vector with the source code matrix to reveal a trainable dataset for a generalized linear model (GLM). Different from preceding studies, our method performs vulnerability detection without requiring vulnerability commits but using source codes. In addition to this, similar studies generally aim to bring sophisticated solutions for just one type of programming language. Conversely, our study develops vulnerability keywords for three programming languages including C#, Java, and C++, and creates the related labeling vectors by regarding the keyword matrix. The proposed method outperformed the baseline approaches for most of the experimental datasets with over 90% of the area under the curve (AUC). Further, there is a 7.7% margin between our method and the alternatives on average for Recall, Precision, and F1-score with respect to five types of vulnerabilities.
引用
收藏
页数:13
相关论文
共 79 条
  • [1] We Don't Need Another Hero? The Impact of "Heroes" on Software Development
    Agrawal, Amritanshu
    Rahman, Akond
    Krishna, Rahul
    Sobran, Alexander
    Menzies, Tim
    [J]. 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - SOFTWARE ENGINEERING IN PRACTICE TRACK (ICSE-SEIP 2018), 2018, : 245 - 253
  • [2] An information-theoretic perspective of tf-idf measures
    Aizawa, A
    [J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 45 - 65
  • [3] iDetect for vulnerability detection in internet of things operating systems using machine learning
    Al-Boghdady, Abdullah
    El-Ramly, Mohammad
    Wassif, Khaled
    [J]. SCIENTIFIC REPORTS, 2022, 12 (01)
  • [4] code2vec: Learning Distributed Representations of Code
    Alon, Uri
    Zilberstein, Meital
    Levy, Omer
    Yahav, Eran
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
  • [5] Bartz E., 2023, Hyperparameter Tuning for Machine and Deep Learning with R: A Practical Guide
  • [6] Deep Learning Based Vulnerability Detection: Are We There Yet?
    Chakraborty, Saikat
    Krishna, Rahul
    Ding, Yangruibo
    Ray, Baishakhi
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) : 3280 - 3296
  • [7] Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction
    Chen, Jinyin
    Hu, Keke
    Yu, Yue
    Chen, Zhuangzhi
    Xuan, Qi
    Liu, Yi
    Filkov, Vladimir
    [J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 578 - 589
  • [8] Neural Transfer Learning for Repairing Security Vulnerabilities in C Code
    Chen, Zimin
    Kommrusch, Steve
    Monperrus, Martin
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (01) : 147 - 165
  • [9] DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network
    Cheng, Xiao
    Wang, Haoyu
    Hua, Jiayi
    Xu, Guoai
    Sui, Yulei
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
  • [10] Christey S., 2013, Common Weakness Enumeration