Automatically Learning Semantic Features for Defect Prediction

被引:458
作者
Wang, Song [1 ]
Liu, Taiyue [1 ]
Tan, Lin [1 ]
机构
[1] Univ Waterloo, Elect & Comp Engn, Waterloo, ON, Canada
来源
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE) | 2016年
基金
加拿大自然科学与工程研究理事会;
关键词
D O I
10.1145/2884781.2884804
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction, which predicts defective code regions, can help developers find bugs and prioritize their testing efforts. To build accurate prediction models, previous studies focus on manually designing features that encode the characteristics of programs and exploring different machine learning algorithms. Existing traditional features often fail to capture the semantic differences of programs, and such a capability is needed for building accurate prediction models. To bridge the gap between programs' semantics and defect prediction features, this paper proposes to leverage a powerful representation- learning algorithm, deep learning, to learn semantic representation of programs automatically from source code. Specifically, we leverage Deep Belief Network (DBN) to automatically learn semantic features from token vectors extracted from programs' Abstract Syntax Trees (ASTs). Our evaluation on ten open source projects shows that our automatically learned semantic features significantly improve both within-project defect prediction (WPDP) and cross-project defect prediction (CPDP) compared to traditional features. Our semantic features improve WPDP on average by 14.7% in precision, 11.5% in recall, and 14.2% in F1. For CPDP, our semantic features based approach outperforms the state-of-the-art technique TCA+ with traditional features by 8.9% in F1.
引用
收藏
页码:297 / 308
页数:12
相关论文
共 70 条
[1]  
[Anonymous], 2005, DATA MINING
[2]  
[Anonymous], ACM COMPUTING SURVEY
[3]  
[Anonymous], ADV NEURAL INFORM PR
[4]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Halstead M.H., 1977, Elements of Software Science (Operating and Programming Systems Series
[7]  
Manning C., 1999, FDN STAT NATURAL LAN
[8]  
MNIH A, ADV NEURAL INFORM PR, P1081
[9]   Acoustic Modeling Using Deep Belief Networks [J].
Mohamed, Abdel-rahman ;
Dahl, George E. ;
Hinton, Geoffrey .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :14-22
[10]  
Moser R., ICSE 08, P181