A cosine similarity-based labeling technique for vulnerability type detection using source codes

被引：1

作者：

Ozturk, M. Maruf ^{[1
]}

机构：

[1] Suleyman Demirel Univ, Engn & Nat Sci Fac, Dept Comp Engn, Isparta, Turkiye

来源：

COMPUTERS & SECURITY | 2024年 / 146卷

关键词：

Vulnerability detection; Cosine similarity; Generalized linear model; Labeling; Text encoding;

D O I：

10.1016/j.cose.2024.104059

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Vulnerability detection is of great importance in providing reliability to software systems. Although existing methods achieve remarkable success in vulnerability detection, they have several disadvantages as follows: (1) The irrelevant information is removed from source codes, which have a high noise ratio, thereby utilizing deep learning methods and devising experiments featuring high accuracy. However, deep learning-based detection methods necessitate large-scale datasets. This results in computational hardship with respect to vulnerability detection in small-scale software systems. (2) The majority of the studies perform feature selection by processing vulnerability commits. Despite tremendous endeavors, there are few works detecting vulnerability with source codes. To solve these two problems, in this study, a novel labeling and vulnerability detection algorithm is proposed. The algorithm first exploits source codes with the help of a keyword vulnerability matrix. After that, an ultimate encoded matrix is generated by word2vec, thereby combining the labeling vector with the source code matrix to reveal a trainable dataset for a generalized linear model (GLM). Different from preceding studies, our method performs vulnerability detection without requiring vulnerability commits but using source codes. In addition to this, similar studies generally aim to bring sophisticated solutions for just one type of programming language. Conversely, our study develops vulnerability keywords for three programming languages including C#, Java, and C++, and creates the related labeling vectors by regarding the keyword matrix. The proposed method outperformed the baseline approaches for most of the experimental datasets with over 90% of the area under the curve (AUC). Further, there is a 7.7% margin between our method and the alternatives on average for Recall, Precision, and F1-score with respect to five types of vulnerabilities.

引用

页数：13

共 79 条

[1] We Don't Need Another Hero? The Impact of "Heroes" on Software Development
Agrawal, Amritanshu
Rahman, Akond
Krishna, Rahul
Sobran, Alexander
Menzies, Tim
[J]. 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING - SOFTWARE ENGINEERING IN PRACTICE TRACK (ICSE-SEIP 2018), 2018, : 245 - 253
[2] An information-theoretic perspective of tf-idf measures
Aizawa, A
[J]. INFORMATION PROCESSING & MANAGEMENT, 2003, 39 (01) : 45 - 65
[3] iDetect for vulnerability detection in internet of things operating systems using machine learning
Al-Boghdady, Abdullah
El-Ramly, Mohammad
Wassif, Khaled
[J]. SCIENTIFIC REPORTS, 2022, 12 (01)
[4] code2vec: Learning Distributed Representations of Code
Alon, Uri
Zilberstein, Meital
Levy, Omer
Yahav, Eran
[J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
[5] Bartz E., 2023, Hyperparameter Tuning for Machine and Deep Learning with R: A Practical Guide
[6] Deep Learning Based Vulnerability Detection: Are We There Yet?
Chakraborty, Saikat
Krishna, Rahul
Ding, Yangruibo
Ray, Baishakhi
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) : 3280 - 3296
[7] Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction
Chen, Jinyin
Hu, Keke
Yu, Yue
Chen, Zhuangzhi
Xuan, Qi
Liu, Yi
Filkov, Vladimir
[J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 578 - 589
[8] Neural Transfer Learning for Repairing Security Vulnerabilities in C Code
Chen, Zimin
Kommrusch, Steve
Monperrus, Martin
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2023, 49 (01) : 147 - 165
[9] DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network
Cheng, Xiao
Wang, Haoyu
Hua, Jiayi
Xu, Guoai
Sui, Yulei
[J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
[10] Christey S., 2013, Common Weakness Enumeration

← 1 2 3 4 5 6 7 8 →