Vulnerability Detection via Multiple-Graph-Based Code Representation

被引：3

作者：

Qiu, Fangcheng ^{[1
]}

Liu, Zhongxin ^{[1
]}

Hu, Xing ^{[2
]}

Xia, Xin ^{[3
]}

Chen, Gang ^{[4
]}

Wang, Xinyu ^{[4
]}

机构：

[1] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou 310027, Zhejiang, Peoples R China

[2] Zhejiang Univ, Sch Software Technol, Ningbo 315103, Zhejiang, Peoples R China

[3] Huawei, Software Engn Applicat Technol Lab, Hangzhou 310051, Zhejiang, Peoples R China

[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Semantics; Codes; Source coding; Graph neural networks; Software; Feature extraction; Deep learning; Vulnerability detection; deep learning; code representation; graph neural network;

D O I：

10.1109/TSE.2024.3427815

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

During software development and maintenance, vulnerability detection is an essential part of software quality assurance. Even though many program-analysis-based and machine-learning-based approaches have been proposed to automatically detect vulnerabilities, they rely on explicit rules or patterns defined by security experts and suffer from either high false positives or high false negatives. Recently, an increasing number of studies leverage deep learning techniques, especially Graph Neural Network (GNN), to detect vulnerabilities. These approaches leverage program analysis to represent the program semantics as graphs and perform graph analysis to detect vulnerabilities. However, they suffer from two main problems: (i) Existing GNN-based techniques do not effectively learn the structural and semantic features from source code for vulnerability detection. (ii) These approaches tend to ignore fine-grained information in source code. To tackle these problems, in this paper, we propose a novel vulnerability detection approach, named MGVD (MULTIPLE-GRAPH-BASED VULNERABILITY DETECTION), to detect vulnerable functions. To effectively learn the structural and semantic features from source code, MGVD uses three different ways to represent each function into multiple forms, i.e., two statement graphs and a sequence of tokens. Then we encode such representations to a three-channel feature matrix. The feature matrix contains the structural feature and the semantic feature of the function. And we add a weight allocation layer to distribute the weights between structural and semantic features. To overcome the second problem, MGVD constructs each graph representation of the input function using multiple different graphs instead of a single graph. Each graph focuses on one statement in the function and its nodes denote the related statements and their fine-grained code elements. Finally, MGVD leverages CNN to identify whether this function is vulnerable based on such feature matrix. We conduct experiments on 3 vulnerability datasets with a total of 30,341 vulnerable functions and 127,931 non-vulnerable functions. The experimental results show that our method outperforms the state-of-the-art by 9.68% - 10.28% in terms of F1-score.

引用

页码：2178 / 2199

页数：22

共 71 条

[1] [Anonymous], 2013, Advances in neural information processing systems
[2] [Anonymous], The 2021 common weakness enumeration top 25 most dangerous software weaknesses
[3] Bastings J., 2017, C EM PIRICAL METHOD, P1957
[4] Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues
Bui, Khac-Hoai Nam
Cho, Jiho
Yi, Hongsuk
[J]. APPLIED INTELLIGENCE, 2022, 52 (03) : 2763 - 2774
[5] MVD: Memory-Related Vulnerability Detection Based on Flow-Sensitive Graph Neural Networks
Cao, Sicong
Sun, Xiaobing
Bo, Lili
Wu, Rongxin
Li, Bin
Tao, Chuanqi
[J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1456 - 1468
[6] Deep Learning Based Vulnerability Detection: Are We There Yet?
Chakraborty, Saikat
Krishna, Rahul
Ding, Yangruibo
Ray, Baishakhi
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) : 3280 - 3296
[7] BinGo: Cross-Architecture Cross-OS Binary Search
Chandramohan, Mahinthan
Xue, Yinxing
Xu, Zhengzi
Liu, Yang
Cho, Chia Yuan
Kuan, Tan Hee Beng
[J]. FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 678 - 689
[8] DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network
Cheng, Xiao
Wang, Haoyu
Hua, Jiayi
Xu, Guoai
Sui, Yulei
[J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
[9] Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
Chiang, Wei-Lin
Liu, Xuanqing
Si, Si
Li, Yang
Bengio, Samy
Hsieh, Cho-Jui
[J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 257 - 266
[10] Dahl GE, 2013, INT CONF ACOUST SPEE, P8609, DOI 10.1109/ICASSP.2013.6639346

← 1 2 3 4 5 6 7 8 →