Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction

被引:68
作者
Chen, Jinyin [1 ]
Hu, Keke [1 ]
Yu, Yue [2 ]
Chen, Zhuangzhi [1 ]
Xuan, Qi [3 ]
Liu, Yi [4 ]
Filkov, Vladimir [5 ]
机构
[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Hefei 230000, Peoples R China
[3] Zhejiang Univ Technol, Inst Cyberspace Secur, Hangzhou 310023, Peoples R China
[4] Zhejiang Univ Technol, Inst Proc Equipment & Control Engn, Hangzhou 310023, Peoples R China
[5] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
来源
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
Cross-project defect prediction; within-project defect prediction; deep transfer learning; self-attention; software visualization; METRICS;
D O I
10.1145/3377811.3380389
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction aims to automatically locate defective code modules to better focus testing resources and human effort. Typically, software defect prediction pipelines are comprised of two parts: the first extracts program features, like abstract syntax trees, by using external tools, and the second applies machine learningbased classification models to those features in order to predict defective modules. Since such approaches depend on specific feature extraction tools, machine learning classifiers have to be custom-tailored to effectively build most accurate models. To bridge the gap between deep learning and defect prediction, we propose an end-to-end framework which can directly get prediction results for programs without utilizing feature-extraction tools. To that end, we first visualize programs as images, apply the self-attention mechanism to extract image features, use transfer learning to reduce the difference in sample distributions between projects, and finally feed the image files into a pre-trained, deep learning model for defect prediction. Experiments with 10 open source projects from the PROMISE dataset show that our method can improve cross-project and within-project defect prediction. Our code and data pointers are available at https://zenodo.org/record/3373409#.XV0Oy5Mza35.
引用
收藏
页码:578 / 589
页数:12
相关论文
共 74 条
[1]  
Agarap Abien Fred, 2017, ARXIV PREPRINT ARXIV
[2]  
Ajakan H., 2014, ARXIV PREPRINT ARXIV
[3]  
[Anonymous], 2010, P 6 INT C PRED MOD S
[4]  
[Anonymous], 2007, P 24 INT C MACH LEAR, DOI [DOI 10.1145/1273496.1273521, 10.1145/1273496.1273521]
[5]  
[Anonymous], 2011, INT JOINT C ART INT
[6]  
[Anonymous], 2008, P 16 ACM SIGSOFT INT, DOI [10.1145/1453101.1453106, DOI 10.1145/1453101.1453106]
[7]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[8]   Software visualization in the large [J].
Ball, T ;
Eick, SG .
COMPUTER, 1996, 29 (04) :33-&
[9]   Software visualization tools: Survey and analysis [J].
Bassil, S ;
Keller, RK .
9TH INTERNATIONAL WORKSHOP ON PROGRAM COMPREHENSION, PROCEEDINGS, 2001, :7-17
[10]   Collective transfer learning for defect prediction [J].
Chen, Jinyin ;
Hu, Keke ;
Yang, Yitao ;
Liu, Yi ;
Xuan, Qi .
NEUROCOMPUTING, 2020, 416 :103-116