An empirical study based on semi-supervised hybrid self-organizing map for software fault prediction

被引:73
作者
Abaei, Golnoush [1 ]
Selamat, Ali [1 ]
Fujita, Hamido [2 ]
机构
[1] Univ Teknol Malaysia, Fac Comp, Dept Software Engn, Software Engn Res Grp, Utm Johor Bahru 81310, Johor, Malaysia
[2] Iwate Prefectural Univ, Takizawa, Japan
关键词
Artificial neural network; Clustering; Self-organizing maps; Semi-supervised; Software fault prediction; Threshold; METRICS;
D O I
10.1016/j.knosys.2014.10.017
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Software testing is a crucial task during software development process with the potential to save time and budget by recognizing defects as early as possible and delivering a more defect-free product. To improve the testing process, fault prediction approaches identify parts of the system that are more defect prone. However, when the defect data or quality-based class labels are not identified or the company does not have similar or earlier versions of the software project, researchers cannot use supervised classification methods for defect detection. In order to detect defect proneness of modules in software projects with high accuracy and improve detection model generalization ability, we propose an automated software fault detection model using semi-supervised hybrid self-organizing map (HySOM). HySOM is a semi-supervised model based on self-organizing map and artificial neural network. The advantage of HySOM is the ability to predict the label of the modules in a semi-supervised manner using software measurement threshold values in the absence of quality data. In semi-supervised HySOM, the role of expert for identifying fault prone modules becomes less critical and more supportive. We have benchmarked the proposed model with eight industrial data sets from NASA and Turkish white-goods embedded controller software. The results show improvement in false negative rate and overall error rate in 80% and 60% of the cases respectively for NASA data sets. Moreover, we investigate the performance of the proposed model with other recent proposed methods. According to the results, our semi-supervised model can be used as an automated tool to guide testing effort by prioritizing the module's defects improving the quality of software development and software testing in less time and budget. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:28 / 39
页数:12
相关论文
共 51 条
[1]  
Abaei G, 2013, IEEE INT C CONTR SYS, P471
[2]  
Abaei G., 2014, HDB RES EMERGING ADV
[3]  
Abaei G., 2013, VIETNAM J COMPUTER S, P1
[4]  
Abu Abbas O, 2008, INT ARAB J INF TECHN, V5, P320
[5]  
Akinduko A.A., 2012, ARXIV12105873
[6]   Thresholds based outlier detection approach for mining class outliers: An empirical case study on software measurement datasets [J].
Alan, Oral ;
Catal, Cagatay .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (04) :3440-3445
[7]  
[Anonymous], 2012, P WORLD C ENG LOND U
[8]  
Attik M, 2005, LECT NOTES COMPUT SC, V3696, P357, DOI 10.1007/11550822_56
[9]  
Baçao F, 2005, LECT NOTES COMPUT SC, V3516, P476
[10]   Software Fault Prediction Using Quad Tree-Based K-Means Clustering Algorithm [J].
Bishnu, Partha Sarathi ;
Bhattacherjee, Vandana .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2012, 24 (06) :1146-1150