Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks

被引:12
作者
Zhao, Zhehao [1 ]
Yang, Bo [2 ]
Li, Ge [1 ]
Liu, Huai [3 ]
Jin, Zhi [1 ]
机构
[1] Peking Univ, Key Lab High Confidence Software Technol, Beijing 100871, Peoples R China
[2] Beijing Forestry Univ, Sch Informat Sci & Technol, Beijing 100083, Peoples R China
[3] Swinburne Univ Technol, Dept Comp Technol, Hawthorn, Vic 3122, Australia
基金
国家重点研发计划; 中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Graph neural network; Program analysis; Deep learning; Abstract syntax Tree; Control flow graph; NEURAL-NETWORK;
D O I
10.1016/j.jss.2021.111108
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Deep learning is being used extensively in a variety of software engineering tasks, e.g., program classification and defect prediction. Although the technique eliminates the required process of feature engineering, the construction of source code model significantly affects the performance on those tasks. Most recent works was mainly focused on complementing AST-based source code models by introducing contextual dependencies extracted from CFG. However, all of them pay little attention to the representation of basic blocks, which are the basis of contextual dependencies. In this paper, we integrated AST and CFG and proposed a novel source code model embedded with hierarchical dependencies. Based on that, we also designed a neural network that depends on the graph attention mechanism. Specifically, we introduced the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information and fill the gap. We have evaluated this model on three practical software engineering tasks and compared it with other state-of-the-art methods. The results show that our model can significantly improve the performance. For example, compared to the best performing baseline, our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 55 条
  • [1] Allamanis M, 2018, Arxiv, DOI arXiv:1711.00740
  • [2] Alon U., 2018, arXiv
  • [3] code2vec: Learning Distributed Representations of Code
    Alon, Uri
    Zilberstein, Meital
    Levy, Omer
    Yahav, Eran
    [J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
  • [4] Alon U, 2018, PROCEEDINGS OF THE 39TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, PLDI 2018, P404, DOI [10.1145/3192366.3192412, 10.1145/3296979.3192412]
  • [5] DGCNN: A convolutional neural network over large-scale labeled graphs
    Anh Viet Phan
    Minh Le Nguyen
    Yen Lam Hoang Nguyen
    Lam Thu Bui
    [J]. NEURAL NETWORKS, 2018, 108 : 533 - 543
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Assessing the applicability of fault-proneness models across object-oriented software projects
    Briand, LC
    Melo, WL
    Wüst, J
    [J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 706 - 720
  • [8] Bruna J, 2014, Arxiv, DOI arXiv:1312.6203
  • [9] Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction
    Chen, Jinyin
    Hu, Keke
    Yu, Yue
    Chen, Zhuangzhi
    Xuan, Qi
    Liu, Yi
    Filkov, Vladimir
    [J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 578 - 589
  • [10] Cvitkovic M, 2019, Arxiv, DOI arXiv:1810.08305