PathPair2Vec: An AST path pair-based code representation method for defect prediction

被引:47
作者
Shi, Ke [1 ]
Lu, Yang [1 ,2 ]
Chang, Jingfei [1 ]
Wei, Zhen [1 ,2 ]
机构
[1] Hefei Univ Technol, Sch Comp Sci & Informat Engn, Hefei, Anhui, Peoples R China
[2] Minist Educ, Engn Res Ctr Safety Crit Ind Measurement & Contro, Hefei 230009, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Defect prediction; AST path; Deep learning; Representation learning;
D O I
10.1016/j.cola.2020.100979
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software project defect prediction (SDP) can predict the bug probability of software by their features and allocate their testing efforts. The existing software defect prediction methods can be divided into two categories: methods based on traditional handcrafted features and methods based on automatically made abstract features, especially those made by deep learning. The current research indicates that deep learning-based automatic features can achieve better performance than handcrafted features. Code2vec (Alon et al. 2019) is one of the best source code representation models, which leverages deep learning to learn automatic representations from code. In this paper, inspired by code2vec, we propose a new AST path pair-based source code representation method (PathPair2Vec) and apply it to software project defect prediction. We first propose the concept of the short path to describe each terminal node and its control logic. Then, we design a new sequence encoding method to code the different parts of the terminal node and its control logic. Finally, by pairs of short paths, we describe the semantic information of code and fuse them by an attention mechanism. Experiments on the PROMISE dataset show that our method improves the F1 score by 17.88% over the state-of-the-art SDP method, and the AST path pair-based source code representation can better identify the defect features of the source code.
引用
收藏
页数:11
相关论文
共 34 条
[1]  
Allamanis M., 2018, INT C LEARN REPR
[2]  
Allamanis M, 2016, PR MACH LEARN RES, V48
[3]  
Alon U., 2019, INT C LEARNING REPRE
[4]   code2vec: Learning Distributed Representations of Code [J].
Alon, Uri ;
Zilberstein, Meital ;
Levy, Omer ;
Yahav, Eran .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[5]  
Alon U, 2018, ACM SIGPLAN NOTICES, V53, P404, DOI [10.1145/3296979.3192412, 10.1145/3192366.3192412]
[6]  
[Anonymous], 1994, P WORKSH PRAGM THEOR
[7]   A hierarchical model for object-oriented design quality assessment [J].
Bansiya, J ;
Davis, CG .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2002, 28 (01) :4-17
[8]   Where Is the Bug and How Is It Fixed? An Experiment with Practitioners [J].
Bohme, Marcel ;
Soremekun, Ezekiel O. ;
Chattopadhyay, Sudipta ;
Ugherughe, Emamurho ;
Zeller, Andreas .
ESEC/FSE 2017: PROCEEDINGS OF THE 2017 11TH JOINT MEETING ON FOUNDATIONS OF SOFTWARE ENGINEERING, 2017, :117-128
[9]  
Chidamber S.R., 1994, METRICS SUITE OBJECT, P197
[10]   A METRICS SUITE FOR OBJECT-ORIENTED DESIGN [J].
CHIDAMBER, SR ;
KEMERER, CF .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (06) :476-493