Vulnerability Detection via Multiple-Graph-Based Code Representation

被引：3

作者：

Qiu, Fangcheng ^{[1
]}

Liu, Zhongxin ^{[1
]}

Hu, Xing ^{[2
]}

Xia, Xin ^{[3
]}

Chen, Gang ^{[4
]}

Wang, Xinyu ^{[4
]}

机构：

[1] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou 310027, Zhejiang, Peoples R China

[2] Zhejiang Univ, Sch Software Technol, Ningbo 315103, Zhejiang, Peoples R China

[3] Huawei, Software Engn Applicat Technol Lab, Hangzhou 310051, Zhejiang, Peoples R China

[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China

来源：

IEEE TRANSACTIONS ON SOFTWARE ENGINEERING | 2024年 / 50卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Semantics; Codes; Source coding; Graph neural networks; Software; Feature extraction; Deep learning; Vulnerability detection; deep learning; code representation; graph neural network;

D O I：

10.1109/TSE.2024.3427815

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

During software development and maintenance, vulnerability detection is an essential part of software quality assurance. Even though many program-analysis-based and machine-learning-based approaches have been proposed to automatically detect vulnerabilities, they rely on explicit rules or patterns defined by security experts and suffer from either high false positives or high false negatives. Recently, an increasing number of studies leverage deep learning techniques, especially Graph Neural Network (GNN), to detect vulnerabilities. These approaches leverage program analysis to represent the program semantics as graphs and perform graph analysis to detect vulnerabilities. However, they suffer from two main problems: (i) Existing GNN-based techniques do not effectively learn the structural and semantic features from source code for vulnerability detection. (ii) These approaches tend to ignore fine-grained information in source code. To tackle these problems, in this paper, we propose a novel vulnerability detection approach, named MGVD (MULTIPLE-GRAPH-BASED VULNERABILITY DETECTION), to detect vulnerable functions. To effectively learn the structural and semantic features from source code, MGVD uses three different ways to represent each function into multiple forms, i.e., two statement graphs and a sequence of tokens. Then we encode such representations to a three-channel feature matrix. The feature matrix contains the structural feature and the semantic feature of the function. And we add a weight allocation layer to distribute the weights between structural and semantic features. To overcome the second problem, MGVD constructs each graph representation of the input function using multiple different graphs instead of a single graph. Each graph focuses on one statement in the function and its nodes denote the related statements and their fine-grained code elements. Finally, MGVD leverages CNN to identify whether this function is vulnerable based on such feature matrix. We conduct experiments on 3 vulnerability datasets with a total of 30,341 vulnerable functions and 127,931 non-vulnerable functions. The experimental results show that our method outperforms the state-of-the-art by 9.68% - 10.28% in terms of F1-score.

引用

页码：2178 / 2199

页数：22

共 71 条

[51] Sennrich R, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1715
[52] Web Application Vulnerability Prediction Using Hybrid Program Analysis and Machine Learning
Shar, Lwin Khin
Briand, Lionel C.
Tan, Hee Beng Kuan
[J]. IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2015, 12 (06) : 688 - 707
[53] SEVu1Det: A Semantics-Enhanced Learnable Vulnerability Detector
Tang, Zhiquan
Hu, Qiao
Hu, Yupeng
Kuang, Wenxin
Chen, Jiongyi
[J]. 2022 52ND ANNUAL IEEE/IFIP INTERNATIONAL CONFERENCE ON DEPENDABLE SYSTEMS AND NETWORKS (DSN 2022), 2022, : 150 - 162
[54] Ti, 2023, Zenodo, DOI 10.5281/ZENODO.8130972
[55] Velickovic P., 2018, P 6 INT C LEARNING R
[56] Combining Graph-Based Learning With Automated Data Collection for Code Vulnerability Detection
Wang, Huanting
Ye, Guixin
Tang, Zhanyong
Tan, Shin Hwei
Huang, Songfang
Fang, Dingyi
Feng, Yansong
Bian, Lizhong
Wang, Zheng
[J]. IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2021, 16 : 1943 - 1958
[57] Automatically Learning Semantic Features for Defect Prediction
Wang, Song
Liu, Taiyue
Tan, Lin
[J]. 2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, : 297 - 308
[58] Wang WH, 2020, PROCEEDINGS OF THE 2020 IEEE 27TH INTERNATIONAL CONFERENCE ON SOFTWARE ANALYSIS, EVOLUTION, AND REENGINEERING (SANER '20), P261, DOI [10.1109/SANER48275.2020.9054857, 10.1109/saner48275.2020.9054857]
[59] Welling M., 2016, INT C LEARNING REPRE
[60] Deep Learning Code Fragments for Code Clone Detection
White, Martin
Tufano, Michele
Vendome, Christopher
Poshyvanyk, Denys
[J]. 2016 31ST IEEE/ACM INTERNATIONAL CONFERENCE ON AUTOMATED SOFTWARE ENGINEERING (ASE), 2016, : 87 - 98

← 1 2 3 4 5 6 7 8 →