Fuzzy C-mean Missing Data Imputation for Analogy-based Effort Estimation

被引:0
作者
AlMutlaq, Ayman Jalal [1 ]
Jawawi, Dayang N. A. [1 ]
Arbain, Adila Firdaus Binti [1 ]
机构
[1] Univ Teknol Malaysia, Fac Engn, Sch Comp, Dept Software Engn, Johor Baharu, Malaysia
关键词
Analogy-based effort estimation; imputation; missing data; fuzzy c-mean; SOFTWARE COST ESTIMATION; DATA SETS; REGRESSION; SELECTION;
D O I
10.14569/IJACSA.2021.0120874
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The accuracy of effort estimation in one of the major factors in the success or failure of software projects. Analogy-Based Estimation (ABE) is a widely accepted estimation model since its flow human nature in selecting analogies similar in nature to the target project. The accuracy of prediction in ABE model in strongly associated with the quality of the dataset since it depends on previous completed projects for estimation. Missing Data (MD) is one of major challenges in software engineering datasets. Several missing data imputation techniques have been investigated by researchers in ABE model. Identification of the most similar donor values from the completed software projects dataset for imputation is a challenging issue in existing missing data techniques adopted for ABE model. In this study, Fuzzy C-Mean Imputation (FCMI), Mean Imputation (MI) and K-Nearest Neighbor Imputation ( KNNI) are investigated to impute missing values in Desharnais dataset under different missing data percentages (Desh- Miss1, Desh-Miss2) for ABE model. FCMI-ABE technique is proposed in this study. Evaluation comparison among MI, KNNI, and ( ABE-FCMI) is conducted for ABE model to identify the suitable MD imputation method. The results suggest that the use of ( ABE-FCMI), rather than MI and KNNI, imputes more reliable values to incomplete software projects in the missing datasets. It was also found that the proposed imputation method significantly improves software development effort prediction of ABE model.
引用
收藏
页码:628 / 640
页数:13
相关论文
共 55 条
[1]  
Abnane I, 2016, PROCEEDINGS OF 2016 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI)
[2]   Fuzzy case-based-reasoning-based imputation for incomplete data in software engineering repositories [J].
Abnane, Ibtissam ;
Idri, Ali ;
Abran, Alain .
JOURNAL OF SOFTWARE-EVOLUTION AND PROCESS, 2020, 32 (09)
[3]   Analogy Software Effort Estimation Using Ensemble KNN Imputation [J].
Abnane, Ibtissam ;
Hosni, Mohamed ;
Idri, Ali ;
Abran, Alain .
2019 45TH EUROMICRO CONFERENCE ON SOFTWARE ENGINEERING AND ADVANCED APPLICATIONS (SEAA 2019), 2019, :228-235
[4]   Improved Analogy-based Effort Estimation with Incomplete Mixed Data [J].
Abnane, Ibtissam ;
Idri, Ali .
PROCEEDINGS OF THE 2018 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2018, :1015-1024
[5]  
Almutlaq A.J.H., 2019, INT C REL INF COMM T
[6]   A simulation tool for efficient analogy based cost estimation [J].
Angelis L. ;
Stamelos I. .
Empirical Software Engineering, 2000, 5 (1) :35-68
[7]   A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm [J].
Aydilek, Ibrahim Berkan ;
Arslan, Ahmet .
INFORMATION SCIENCES, 2013, 233 :25-35
[8]  
Banerjee A, 2005, J MACH LEARN RES, V6, P1705
[9]   Nearest neighbor imputation algorithms: a critical evaluation [J].
Beretta, Lorenzo ;
Santaniello, Alessandro .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2016, 16
[10]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203