Software Visualization and Deep Transfer Learning for Effective Software Defect Prediction

被引:68
作者
Chen, Jinyin [1 ]
Hu, Keke [1 ]
Yu, Yue [2 ]
Chen, Zhuangzhi [1 ]
Xuan, Qi [3 ]
Liu, Yi [4 ]
Filkov, Vladimir [5 ]
机构
[1] Zhejiang Univ Technol, Coll Informat Engn, Hangzhou 310023, Peoples R China
[2] Natl Univ Def Technol, Coll Comp, Hefei 230000, Peoples R China
[3] Zhejiang Univ Technol, Inst Cyberspace Secur, Hangzhou 310023, Peoples R China
[4] Zhejiang Univ Technol, Inst Proc Equipment & Control Engn, Hangzhou 310023, Peoples R China
[5] Univ Calif Davis, Dept Comp Sci, Davis, CA 95616 USA
来源
2020 ACM/IEEE 42ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE 2020) | 2020年
基金
浙江省自然科学基金; 中国国家自然科学基金;
关键词
Cross-project defect prediction; within-project defect prediction; deep transfer learning; self-attention; software visualization; METRICS;
D O I
10.1145/3377811.3380389
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Software defect prediction aims to automatically locate defective code modules to better focus testing resources and human effort. Typically, software defect prediction pipelines are comprised of two parts: the first extracts program features, like abstract syntax trees, by using external tools, and the second applies machine learningbased classification models to those features in order to predict defective modules. Since such approaches depend on specific feature extraction tools, machine learning classifiers have to be custom-tailored to effectively build most accurate models. To bridge the gap between deep learning and defect prediction, we propose an end-to-end framework which can directly get prediction results for programs without utilizing feature-extraction tools. To that end, we first visualize programs as images, apply the self-attention mechanism to extract image features, use transfer learning to reduce the difference in sample distributions between projects, and finally feed the image files into a pre-trained, deep learning model for defect prediction. Experiments with 10 open source projects from the PROMISE dataset show that our method can improve cross-project and within-project defect prediction. Our code and data pointers are available at https://zenodo.org/record/3373409#.XV0Oy5Mza35.
引用
收藏
页码:578 / 589
页数:12
相关论文
共 74 条
[11]   Multiview Transfer Learning for Software Defect Prediction [J].
Chen, Jinyin ;
Yang, Yitao ;
Hu, Keke ;
Xuan, Qi ;
Liu, Yi ;
Yang, Chao .
IEEE ACCESS, 2019, 7 :8901-8916
[12]   A METRICS SUITE FOR OBJECT-ORIENTED DESIGN [J].
CHIDAMBER, SR ;
KEMERER, CF .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 1994, 20 (06) :476-493
[13]  
Dam Hoa Khanh, 2018, ARXIV PREPRINT ARXIV
[14]  
Diehl S., 2007, Software visualization: visualizing the structure, behaviour, and evolution of software
[15]   Predicting defect-prone software modules using support vector machines [J].
Elish, Karim O. ;
Elish, Mahmoud O. .
JOURNAL OF SYSTEMS AND SOFTWARE, 2008, 81 (05) :649-660
[16]   Software visualization [J].
Gracanin, Denis ;
Matkovic, Kresimir ;
Eltoweissy, Mohamed .
INNOVATIONS IN SYSTEMS AND SOFTWARE ENGINEERING, 2005, 1 (02) :221-230
[17]  
Gretton A., 2012, P 25 INT C NEUR INF, V2, P1205, DOI DOI 10.5555/2999134.2999269
[18]  
Habibi PA, 2018, 2018 INTERNATIONAL WORKSHOP ON BIG DATA AND INFORMATION SECURITY (IWBIS), P13, DOI 10.1109/IWBIS.2018.8471701
[19]  
Halstead M.H., 1977, Elements of Software Science, V7
[20]   Predicting Faults Using the Complexity of Code Changes [J].
Hassan, Ahmed E. .
2009 31ST INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, PROCEEDINGS, 2009, :78-88