Vulnerability Detection via Multiple-Graph-Based Code Representation

被引:3
作者
Qiu, Fangcheng [1 ]
Liu, Zhongxin [1 ]
Hu, Xing [2 ]
Xia, Xin [3 ]
Chen, Gang [4 ]
Wang, Xinyu [4 ]
机构
[1] Zhejiang Univ, State Key Lab Blockchain & Data Secur, Hangzhou 310027, Zhejiang, Peoples R China
[2] Zhejiang Univ, Sch Software Technol, Ningbo 315103, Zhejiang, Peoples R China
[3] Huawei, Software Engn Applicat Technol Lab, Hangzhou 310051, Zhejiang, Peoples R China
[4] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Semantics; Codes; Source coding; Graph neural networks; Software; Feature extraction; Deep learning; Vulnerability detection; deep learning; code representation; graph neural network;
D O I
10.1109/TSE.2024.3427815
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
During software development and maintenance, vulnerability detection is an essential part of software quality assurance. Even though many program-analysis-based and machine-learning-based approaches have been proposed to automatically detect vulnerabilities, they rely on explicit rules or patterns defined by security experts and suffer from either high false positives or high false negatives. Recently, an increasing number of studies leverage deep learning techniques, especially Graph Neural Network (GNN), to detect vulnerabilities. These approaches leverage program analysis to represent the program semantics as graphs and perform graph analysis to detect vulnerabilities. However, they suffer from two main problems: (i) Existing GNN-based techniques do not effectively learn the structural and semantic features from source code for vulnerability detection. (ii) These approaches tend to ignore fine-grained information in source code. To tackle these problems, in this paper, we propose a novel vulnerability detection approach, named MGVD (MULTIPLE-GRAPH-BASED VULNERABILITY DETECTION), to detect vulnerable functions. To effectively learn the structural and semantic features from source code, MGVD uses three different ways to represent each function into multiple forms, i.e., two statement graphs and a sequence of tokens. Then we encode such representations to a three-channel feature matrix. The feature matrix contains the structural feature and the semantic feature of the function. And we add a weight allocation layer to distribute the weights between structural and semantic features. To overcome the second problem, MGVD constructs each graph representation of the input function using multiple different graphs instead of a single graph. Each graph focuses on one statement in the function and its nodes denote the related statements and their fine-grained code elements. Finally, MGVD leverages CNN to identify whether this function is vulnerable based on such feature matrix. We conduct experiments on 3 vulnerability datasets with a total of 30,341 vulnerable functions and 127,931 non-vulnerable functions. The experimental results show that our method outperforms the state-of-the-art by 9.68% - 10.28% in terms of F1-score.
引用
收藏
页码:2178 / 2199
页数:22
相关论文
共 71 条
  • [1] [Anonymous], 2013, Advances in neural information processing systems
  • [2] [Anonymous], The 2021 common weakness enumeration top 25 most dangerous software weaknesses
  • [3] Bastings J., 2017, C EM PIRICAL METHOD, P1957
  • [4] Spatial-temporal graph neural network for traffic forecasting: An overview and open research issues
    Bui, Khac-Hoai Nam
    Cho, Jiho
    Yi, Hongsuk
    [J]. APPLIED INTELLIGENCE, 2022, 52 (03) : 2763 - 2774
  • [5] MVD: Memory-Related Vulnerability Detection Based on Flow-Sensitive Graph Neural Networks
    Cao, Sicong
    Sun, Xiaobing
    Bo, Lili
    Wu, Rongxin
    Li, Bin
    Tao, Chuanqi
    [J]. 2022 ACM/IEEE 44TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2022), 2022, : 1456 - 1468
  • [6] Deep Learning Based Vulnerability Detection: Are We There Yet?
    Chakraborty, Saikat
    Krishna, Rahul
    Ding, Yangruibo
    Ray, Baishakhi
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (09) : 3280 - 3296
  • [7] BinGo: Cross-Architecture Cross-OS Binary Search
    Chandramohan, Mahinthan
    Xue, Yinxing
    Xu, Zhengzi
    Liu, Yang
    Cho, Chia Yuan
    Kuan, Tan Hee Beng
    [J]. FSE'16: PROCEEDINGS OF THE 2016 24TH ACM SIGSOFT INTERNATIONAL SYMPOSIUM ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2016, : 678 - 689
  • [8] DeepWukong: Statically Detecting Software Vulnerabilities Using Deep Graph Neural Network
    Cheng, Xiao
    Wang, Haoyu
    Hua, Jiayi
    Xu, Guoai
    Sui, Yulei
    [J]. ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2021, 30 (03)
  • [9] Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks
    Chiang, Wei-Lin
    Liu, Xuanqing
    Si, Si
    Li, Yang
    Bengio, Samy
    Hsieh, Cho-Jui
    [J]. KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 257 - 266
  • [10] Dahl GE, 2013, INT CONF ACOUST SPEE, P8609, DOI 10.1109/ICASSP.2013.6639346