DNNAttention: A deep neural network and attention based architecture for cross project defect number prediction

被引:14
作者
Pandey, Sushant Kumar [1 ]
Tripathi, Anil Kumar [1 ]
机构
[1] Indian Inst Technol BHU, Dept Comp Sci & Engn, Varanasi, Uttar Pradesh, India
关键词
Cross project defect prediction; Deep neural network; Attention layer; Long short term memory (LSTM); Software defect number prediction; ENSEMBLE; MODEL;
D O I
10.1016/j.knosys.2021.107541
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software defect prediction (SDP) is the process of detecting fault-prone classes or modules in a software system. It helps in allocating resources before the testing phase more optimally. Due to a lack of an adequate dataset, defects can be predicted by employing data from different projects to train the classifier called cross-project defect prediction (CPDP). Cross-project defect number prediction (CPDNP) is one step ahead of CPDP, in which we can also estimate the number of defects in each module of a software system; we contemplate it as a regression problem. This article dealt with the CPDNP mechanism and suggested a CPDNP architecture by employing a deep neural network and attention layer called DNNAttention. We syntheses substantial data named cross-heap by utilizing an amalgamation of 44 projects from the PROMISE repository. We fed the cross-heap into DNNAttention to train and evaluate the performance over 44 datasets by applying transfer learning. We have also address class imbalance (CI) and overfitting problems by employing multi-label random over-sampling and dropout regularization, respectively. We compared the performance of the DNNAttention using mean squared error (MSE), mean absolute error (MAE), and accuracy over eight baseline methods. We found out of 44 projects, 19 and 20 have minimum MSE and MAE, respectively, and in 19 projects, accuracy yields by the proposed model surpasses exiting techniques. We also compared the performance in terms of Kendall and Fault-Percentile-Average with the recent unsupervised method and found DNNAttention significantly outperforms this method. Moreover, we found the improvement of the DNNAttention over other baseline methods in terms of MAE, MSE, and accuracy by inspecting 20% line of code are substantial. In most situations, the improvements are significant, and they have a large effect size across all 44 projects. (c) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:30
相关论文
共 92 条
[81]  
Wang J, 2012, INT SYMP EMP SOFTWAR, P191, DOI 10.1145/2372251.2372287
[82]  
Wang Y., 2016, P 2016 C EMP METH NA, P606
[83]   Comparing the effectiveness of several modeling methods for fault prediction [J].
Weyuker, Elaine J. ;
Ostrand, Thomas J. ;
Bell, Robert M. .
EMPIRICAL SOFTWARE ENGINEERING, 2010, 15 (03) :277-295
[84]  
Woolson R, 2007, Wiley Encyclopedia of Clinical Trials, P1, DOI [DOI 10.1002/9780471462422.EOCT979, 10.1002/9780471462422.eoct979]
[85]   HYDRA: Massively Compositional Model for Cross-Project Defect Prediction [J].
Xia, Xin ;
Lo, David ;
Pan, Sinno Jialin ;
Nagappan, Nachiappan ;
Wang, Xinyu .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2016, 42 (10) :977-998
[86]  
Xu K, 2015, PR MACH LEARN RES, V37, P2048
[87]   Local versus Global Models for Just-In-Time Software Defect Prediction [J].
Yang, Xingguang ;
Yu, Huiqun ;
Fan, Guisheng ;
Shi, Kai ;
Chen, Liqiong .
SCIENTIFIC PROGRAMMING, 2019, 2019
[88]   Learning from Imbalanced Data for Predicting the Number of Software Defects [J].
Yu, Xiao ;
Liu, Jin ;
Yang, Zijiang ;
Jia, Xiangyang ;
Ling, Qi ;
Ye, Sizhe .
2017 IEEE 28TH INTERNATIONAL SYMPOSIUM ON SOFTWARE RELIABILITY ENGINEERING (ISSRE), 2017, :78-89
[89]   Data Transformation in Cross-project Defect Prediction [J].
Zhang, Feng ;
Keivanloo, Iman ;
Zou, Ying .
EMPIRICAL SOFTWARE ENGINEERING, 2017, 22 (06) :3186-3218
[90]   Cross-project Defect Prediction Using a Connectivity-based Unsupervised Classifier [J].
Zhang, Feng ;
Zheng, Quan ;
Zou, Ying ;
Hassan, Ahmed E. .
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, :309-320