Analogy Software Effort Estimation Using Ensemble KNN Imputation

被引:16
作者
Abnane, Ibtissam [1 ]
Hosni, Mohamed [1 ]
Idri, Ali [1 ]
Abran, Alain [2 ]
机构
[1] Univ Mohammed 5, ENSIAS, Software Project Management Res Team, Rabat, Morocco
[2] Univ Quebec, Dept Software Engn & Informat Technol, ETS, Montreal, PQ, Canada
来源
2019 45TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2019) | 2019年
关键词
Analogy-based software effort estimation; standardized accuracy; missing data; imputation; ensemble; grid search; parameter optimization; COST ESTIMATION; SYSTEMS;
D O I
10.1109/SEAA.2019.00044
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Missing data are a serious issue that influences the prediction accuracy of software development effort estimation (SDEE) techniques and especially analogy-based software effort estimation (ASEE). Hence, appropriate handling of missing data is necessary in order to ensure best performance. To deal with this issue K-nearest neighbors (KNN) imputation has been widely used. However, none of the studies investigating KNN imputation in SDEE have addressed the impact of parameter settings on the imputation process given that parameter optimization techniques are often used at the prediction level, as they highly impact the performance of SDEE techniques including ASEE. This paper proposes and evaluates an ensemble KNN imputation technique for ASEE. Thereafter, we compare ASEE performance using ensemble KNN imputation with those using either a grid search based single KNN imputation or KNN imputation without parameter optimization. For the six datasets used for comparison, the ensemble KNN imputation significantly improved ASEE performance compared with KNN imputation without optimization. Moreover, ensemble KNN imputation and grid search-based imputation behaved similarly. Given that grid search is time consuming, the ensemble KNN imputation may be an alternative to deal with missing data in the ASEE process.
引用
收藏
页码:228 / 235
页数:8
相关论文
共 43 条
[41]   Missing-Values Adjustment for Mixed-Type Data [J].
Tarsitano, Agostino ;
Falcone, Marianna .
JOURNAL OF PROBABILITY AND STATISTICS, 2011, 2011
[42]  
Wang H., 2017, European Journal of Operational Research
[43]   Systematic literature review of machine learning based software development effort estimation models [J].
Wen, Jianfeng ;
Li, Shixian ;
Lin, Zhiyong ;
Hu, Yong ;
Huang, Changqin .
INFORMATION AND SOFTWARE TECHNOLOGY, 2012, 54 (01) :41-59