Automatically Detect Software Security Vulnerabilities Based on Natural Language Processing Techniques and Machine Learning Algorithms

被引：9

作者：

Cho Do Xuan ^{[1
]}

Vu Ngoc Son ^{[2
]}

Duong Duc ^{[2
]}

机构：

[1] Posts & Telecommun Inst Technol, Fac Informat Assurance, Hanoi, Vietnam

[2] FPT Univ, Informat Assurance Dept, Hanoi, Vietnam

来源：

JOURNAL OF ICT RESEARCH AND APPLICATIONS | 2022年 / 16卷 / 01期

关键词：

machine learning algorithms; natural language processing techniques; software security vulnerability detection; software vulnerabilities; source code features;

D O I：

10.5614/itbj.ict.res.appl.2022.16.1.5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Nowadays, software vulnerabilities pose a serious problem, because cyber-attackers often find ways to attack a system by exploiting software vulnerabilities. Detecting software vulnerabilities can be done using two main methods: i) signature-based detection, i.e. methods based on a list of known security vulnerabilities as a basis for contrasting and comparing; ii) behavior analysis-based detection using classification algorithms, i.e., methods based on analyzing the software code. In order to improve the ability to accurately detect software security vulnerabilities, this study proposes a new approach based on a technique of analyzing and standardizing software code and the random forest (RF) classification algorithm. The novelty and advantages of our proposed method are that to determine abnormal behavior of functions in the software, instead of trying to define behaviors of functions, this study uses the Word2vec natural language processing model to normalize and extract features of functions. Finally, to detect security vulnerabilities in the functions, this study proposes to use a popular and effective supervised machine learning algorithm.

引用

页码：70 / 88

页数：19

共 31 条

[1] Harer JA, 2018, Arxiv, DOI arXiv:1803.04497
[2] Introduction to set constraint-based program analysis
Aiken, A
[J]. SCIENCE OF COMPUTER PROGRAMMING, 1999, 35 (2-3) : 79 - 111
[3] A Survey on Software Defect Prediction Using Deep Learning
Akimova, Elena N.
Bersenev, Alexander Yu
Deikov, Artem A.
Kobylkin, Konstantin S.
Konygin, Anton, V
Mezentsev, Ilya P.
Misilov, Vladimir E.
[J]. MATHEMATICS, 2021, 9 (11)
[4] Al-Azzani S., 2012, P 2012 JOINT WORK IE
[5] Bin2vec: learning representations of binary executable programs for security tasks
Arakelyan, Shushan
Arasteh, Sima
Hauser, Christophe
Kline, Erik
Galstyan, Aram
[J]. CYBERSECURITY, 2021, 4 (01)
[6] Bo X., 2008, Syst. Eng. Electron. Technol., V30, P617
[7] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[8] Deep Learning Based Vulnerability Detection: Are We There Yet?
Chakraborty, Saikat
Krishna, Rahul
Ding, Yangruibo
Ray, Baishakhi
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) : 3280 - 3296
[9] Chen Z., 2021, arXiv
[10] Ganapathy V., 2003, ACM C COMPUTER COMMU, P345, DOI DOI 10.1145/948109.948155

← 1 2 3 4 →