Analogy Software Effort Estimation Using Ensemble KNN Imputation

被引:16
作者
Abnane, Ibtissam [1 ]
Hosni, Mohamed [1 ]
Idri, Ali [1 ]
Abran, Alain [2 ]
机构
[1] Univ Mohammed 5, ENSIAS, Software Project Management Res Team, Rabat, Morocco
[2] Univ Quebec, Dept Software Engn & Informat Technol, ETS, Montreal, PQ, Canada
来源
2019 45TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2019) | 2019年
关键词
Analogy-based software effort estimation; standardized accuracy; missing data; imputation; ensemble; grid search; parameter optimization; COST ESTIMATION; SYSTEMS;
D O I
10.1109/SEAA.2019.00044
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Missing data are a serious issue that influences the prediction accuracy of software development effort estimation (SDEE) techniques and especially analogy-based software effort estimation (ASEE). Hence, appropriate handling of missing data is necessary in order to ensure best performance. To deal with this issue K-nearest neighbors (KNN) imputation has been widely used. However, none of the studies investigating KNN imputation in SDEE have addressed the impact of parameter settings on the imputation process given that parameter optimization techniques are often used at the prediction level, as they highly impact the performance of SDEE techniques including ASEE. This paper proposes and evaluates an ensemble KNN imputation technique for ASEE. Thereafter, we compare ASEE performance using ensemble KNN imputation with those using either a grid search based single KNN imputation or KNN imputation without parameter optimization. For the six datasets used for comparison, the ensemble KNN imputation significantly improved ASEE performance compared with KNN imputation without optimization. Moreover, ensemble KNN imputation and grid search-based imputation behaved similarly. Given that grid search is time consuming, the ensemble KNN imputation may be an alternative to deal with missing data in the ASEE process.
引用
收藏
页码:228 / 235
页数:8
相关论文
共 43 条
[1]   Partial least squares regression and projection on latent structure regression (PLS Regression) [J].
Abdi, Herve .
WILEY INTERDISCIPLINARY REVIEWS-COMPUTATIONAL STATISTICS, 2010, 2 (01) :97-106
[2]   Improved Analogy-based Effort Estimation with Incomplete Mixed Data [J].
Abnane, Ibtissam ;
Idri, Ali .
PROCEEDINGS OF THE 2018 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2018, :1015-1024
[3]  
Amazal Fatima-Azzahra, 2014, 2014 Joint Conference of the International Workshop on Software Measurement and the International Conference on Software Process and Product Measurement. (IWSM-MENSURA). Proceedings, P252, DOI 10.1109/IWSM.Mensura.2014.31
[4]  
Amazal FA., 2014, 21 AS PAC SOFTW ENG, P1
[5]  
[Anonymous], 2012, ENSEMBLE METHODS
[6]  
[Anonymous], 2012, The promise repository of empirical software engineering data, Book The promise repository of empirical software engineering data, Series The promise repository of empirical software engineering data
[7]   A new perspective on data homogeneity in software cost estimation: a study in the embedded systems domain [J].
Bakir, Ayse ;
Turhan, Burak ;
Bener, Ayse B. .
SOFTWARE QUALITY JOURNAL, 2010, 18 (01) :57-80
[8]  
Batista GE, 2002, SER FRONT ARTIF INTE, P251
[9]  
Cartwright M. H., 2003, P 5 INT WORKSH ENT N
[10]  
Chandra A., 2006, J. Math. Model. Algoritm, V5, P417, DOI DOI 10.1007/S10852-005-9020-3