Defect Prediction With Semantics and Context Features of Codes Based on Graph Representation Learning

被引:26
作者
Xu, Jiaxi [1 ]
Wang, Fei [1 ]
Ai, Jun [1 ]
机构
[1] Beihang Univ, Sch Reliabil & Syst Engn, Beijing 100191, Peoples R China
关键词
Software; Software development management; Measurement; Semantics; Syntactics; Data mining; Computer bugs; Deep learning; defect prediction; graph representation learning; software defect dataset; software engineering; SOFTWARE; QUALITY; METRICS;
D O I
10.1109/TR.2020.3040191
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
To optimize the process of software testing and to improve software quality and reliability, many attempts have been made to develop more effective methods for predicting software defects. Previous work on defect prediction has used machine learning and artificial software metrics. Unfortunately, artificial metrics are unable to represent the features of syntactic, semantic, and context information of defective modules. In this article, therefore, we propose a practical approach for identifying software defect patterns via the combination of semantics and context information using abstract syntax tree representation learning. Graph neural networks are also leveraged to capture the latent defect information of defective subtrees, which are pruned based on a fix-inducing change. To validate the proposed approach for predicting defects, we define mining rules based on the GitHub workflow and collect 6052 defects from 307 projects. The experiments indicate that the proposed approach performs better than the state-of-the-art approach and five traditional machine learning baselines. An ablation study shows that the information about code concepts leads to a significant increase in accuracy.
引用
收藏
页码:613 / 625
页数:13
相关论文
共 46 条
[1]   A Survey of Machine Learning for Big Code and Naturalness [J].
Allamanis, Miltiadis ;
Barr, Earl T. ;
Devanbu, Premkumar ;
Sutton, Charles .
ACM COMPUTING SURVEYS, 2018, 51 (04)
[2]   Convolutional Neural Networks over Control Flow Graphs for Software Defect Prediction [J].
Anh Viet Phan ;
Minh Le Nguyen ;
Lam Thu Bui .
2017 IEEE 29TH INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2017), 2017, :45-52
[3]  
[Anonymous], 2017, P 31 INT C NEUR INF
[4]  
[Anonymous], 2007, Third International Workshop on Predictor Models in Software Engineering (PROMISE'07: ICSE Workshops 2007)
[5]  
[Anonymous], 2018, P 6 INT C LEARN REPR
[6]  
Bai X., 2020, INT J PERFORMABILITY, V16, P979, DOI DOI 10.23940/IJPE.20.06.P16.979990
[7]   A validation of object-oriented design metrics as quality indicators [J].
Basili, VR ;
Briand, LC ;
Melo, WL .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1996, 22 (10) :751-761
[8]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[9]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[10]  
Cai Z., IEEE ACCESS, V7