CSGVD: A deep learning approach combining sequence and graph embedding for source code vulnerability detection

被引：24

作者：

Tang, Wei ^{[1
]}

Tang, Mingwei ^{[1
]}

Ban, Minchao ^{[1
]}

Zhao, Ziguo ^{[1
]}

Feng, Mingjun ^{[2
]}

机构：

[1] Xihua Univ, Sch Comp & Software Engn, Chengdu 610039, Sichuan, Peoples R China

[2] State Grid Tibet Elect Power Co Ltd, Beijing, Peoples R China

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2023年 / 199卷

基金：

中国国家自然科学基金;

关键词：

Graph neural networks; Vulnerability detection; Sequence embedding; Graph embedding; Pre -trained language model; Attention pooling; NEURAL-NETWORKS;

D O I：

10.1016/j.jss.2023.111623

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In order to secure software, it is critical to detect potential vulnerabilities. The performance of traditional static vulnerability detection methods is limited by predefined rules, which rely heavily on the expertise of developers. Existing deep learning-based vulnerability detection models usu-ally use only a single sequence or graph embedding approach to extract vulnerability features. Sequence embedding-based models ignore the structured information inherent in the code, and graph embedding-based models lack effective node and graph embedding methods. As a result, we propose a novel deep learning-based approach, CSGVD (Combining Sequence and Graph embedding for Vulnerability Detection), which considers function-level vulnerability detection as a graph binary classification task. Firstly, we propose a PE-BL module, which inherits and enhances the knowledge from the pre-trained language model. It extracts the code's local semantic features as node embedding in the control flow graph by using sequence embedding. Secondly, CSGVD uses graph neural networks to extract the structured information of the graph. Finally, we propose a mean biaffine attention pool-ing, M-BFA, to better aggregate node information as a graph's feature representation. The experimental results show that CSGVD outperforms the existing state-of-the-art models and obtains 64.46% accuracy on the real-world benchmark dataset from CodeXGLUE for vulnerability detection.(c) 2023 Elsevier Inc. All rights reserved.

引用

页数：11

共 59 条

[1]

Ahmad WU, 2021, 2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021), P2655

[2] code2vec: Learning Distributed Representations of Code [J].

Alon, Uri ;

Zilberstein, Meital ;

Levy, Omer ;

Yahav, Eran .

PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)

[3]

[Anonymous], 2022, Checkmarx

[4]

[Anonymous], 2022, Flawfinder

[5]

[Anonymous], 2022, CLANG STATIC ANAL

[6]

[Anonymous], 2022, COVERITY

[7]

[Anonymous], 2022, Qemu

[8]

[Anonymous], 2022, Joern-The Bug Hunter's Workbench

[9]

[Anonymous], 2022, Rats

[10]

[Anonymous], 2022, NATIONAL VULNERABILITY DATABASE

← 1 2 3 4 5 6 →