Evaluating ensemble imputation in software effort estimation

被引:0
作者
Ibtissam Abnane
Ali Idri
Imane Chlioui
Alain Abran
机构
[1] Mohammed V University,Software Project Management Research Team, ENSIAS
[2] Mohammed VI Polytechnic University,MSDA
[3] University of Quebec,Dept. of Software Engineering and Information Technology, ETS
来源
Empirical Software Engineering | 2023年 / 28卷
关键词
Missing data; Imputation; Ensemble; Software development effort estimation;
D O I
暂无
中图分类号
学科分类号
摘要
Choosing the appropriate missing data (MD) imputation technique for a given software development effort estimation (SDEE) technique is not a trivial task. In fact, the impact of MD imputation on the estimation output depends on the dataset and the SDEE technique used, and there is no best imputation technique in all contexts. Thus, an attractive solution is to use more than one imputation technique and combine their results to obtain a final imputation outcome. This concept is called ensemble imputation and can significantly improve the effort estimation accuracy. This study proposes and constructs 11 heterogeneous ensemble imputation techniques, whose members are two, three, or four of the following single imputation techniques: K-nearest neighbors, expectation maximization, support vector regression (SVR) and decision trees (DTs). The effects of single/ensemble imputation techniques on SDEE performance were evaluated over six SDEE datasets: COCOMO81, ISBSG, Desharnais, China, Kemerer, and Miyazaki. Five SDEE performance measures were used: standardized accuracy (SA), predictor at 25% (Pred (0.25)), mean balanced relative error (MBRE), mean inverted balanced relative error (MIBRE), and logarithmic standard deviation (LSD). Moreover, we used: (1) the Skott-Knott (SK) statistical test to cluster and compare the results, and (2) the Borda count method to rank the SDEE techniques belonging to the best SK cluster.
引用
收藏
相关论文
共 123 条
[1]  
Albrecht AJ(1983)Software function, source lines of code, and development effort prediction: a software science validation IEEE Trans Softw Eng SE-9 639-648
[2]  
Gaffney JE(2013)A hybrid method for imputation of missing values using optimized fuzzy c -means with support vector regression and a genetic algorithm Inf Sci (Ny) 233 25-35
[3]  
Aydilek IB(2015)An empirical evaluation of ensemble adjustment methods for analogy-based effort estimation J Syst Softw 103 36-52
[4]  
Arslan A(2012)An approach to operational modal analysis using the expectation maximization algorithm Mech Syst Signal Process 31 109-129
[5]  
Azzeh M(2015)Tree-based prediction on incomplete data using imputation or surrogate decisions Inf Sci (Ny) 311 163-181
[6]  
Nassif AB(2006)Ensemble learning using multi-objective evolutionary algorithms J Math Model Algo 5 417-445
[7]  
Minku LL(1995)Support-vector networks Mach Learn 20 273-297
[8]  
Cara FJ(1995)Support-vector networks Mach Learn 20 273-297
[9]  
Carpio J(1977)Maximum likelihood from incomplete data via the EM algorithm J R Stat Soc Ser B 39 1-22
[10]  
Juan J(1977)Maximum likelihood from incomplete data via the EM algorithm J R Stat Soc Ser B 39 1-38