Precise Learning of Source Code Contextual Semantics via Hierarchical Dependence Structure and Graph Attention Networks

被引：12

作者：

Zhao, Zhehao ^{[1
]}

Yang, Bo ^{[2
]}

Li, Ge ^{[1
]}

Liu, Huai ^{[3
]}

Jin, Zhi ^{[1
]}

机构：

[1] Peking Univ, Key Lab High Confidence Software Technol, Beijing 100871, Peoples R China

[2] Beijing Forestry Univ, Sch Informat Sci & Technol, Beijing 100083, Peoples R China

[3] Swinburne Univ Technol, Dept Comp Technol, Hawthorn, Vic 3122, Australia

来源：

JOURNAL OF SYSTEMS AND SOFTWARE | 2022年 / 184卷

基金：

国家重点研发计划; 中国国家自然科学基金; 澳大利亚研究理事会;

关键词：

Graph neural network; Program analysis; Deep learning; Abstract syntax Tree; Control flow graph; NEURAL-NETWORK;

D O I：

10.1016/j.jss.2021.111108

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Deep learning is being used extensively in a variety of software engineering tasks, e.g., program classification and defect prediction. Although the technique eliminates the required process of feature engineering, the construction of source code model significantly affects the performance on those tasks. Most recent works was mainly focused on complementing AST-based source code models by introducing contextual dependencies extracted from CFG. However, all of them pay little attention to the representation of basic blocks, which are the basis of contextual dependencies. In this paper, we integrated AST and CFG and proposed a novel source code model embedded with hierarchical dependencies. Based on that, we also designed a neural network that depends on the graph attention mechanism. Specifically, we introduced the syntactic structural of the basic block, i.e., its corresponding AST, in source code model to provide sufficient information and fill the gap. We have evaluated this model on three practical software engineering tasks and compared it with other state-of-the-art methods. The results show that our model can significantly improve the performance. For example, compared to the best performing baseline, our model reduces the scale of parameters by 50% and achieves 4% improvement on accuracy on program classification task. (c) 2021 Elsevier Inc. All rights reserved.

引用

页数：13

共 55 条

[1] Allamanis M, 2018, Arxiv, DOI arXiv:1711.00740
[2] Alon U., 2018, arXiv
[3] code2vec: Learning Distributed Representations of Code
Alon, Uri
Zilberstein, Meital
Levy, Omer
Yahav, Eran
[J]. PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL):
[4] Alon U, 2018, PROCEEDINGS OF THE 39TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION, PLDI 2018, P404, DOI [10.1145/3192366.3192412, 10.1145/3296979.3192412]
[5] DGCNN: A convolutional neural network over large-scale labeled graphs
Anh Viet Phan
Minh Le Nguyen
Yen Lam Hoang Nguyen
Lam Thu Bui
[J]. NEURAL NETWORKS, 2018, 108 : 533 - 543
[6] Random forests
Breiman, L
[J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
[7] Assessing the applicability of fault-proneness models across object-oriented software projects
Briand, LC
Melo, WL
Wüst, J
[J]. IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (07) : 706 - 720
[8] Bruna J, 2014, Arxiv, DOI arXiv:1312.6203
[9] Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction
Chen, Jinyin
Hu, Keke
Yu, Yue
Chen, Zhuangzhi
Xuan, Qi
Liu, Yi
Filkov, Vladimir
[J]. 2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020), 2020, : 578 - 589
[10] Cvitkovic M, 2019, Arxiv, DOI arXiv:1810.08305

← 1 2 3 4 5 6 →