Optimized fuzzy clustering-based k-nearest neighbors imputation for mixed missing data in software development effort estimation

被引:2
|
作者
Abnane, Ibtissam [1 ]
Idri, Ali [1 ,2 ]
Abran, Alain [3 ]
机构
[1] Mohammed V Univ, Software Project Management Res Team, ENSIAS, Rabat, Morocco
[2] Mohammed VI Polytech Univ, Ben Guerir, Morocco
[3] Univ Quebec, Dept Software Engn & Informat Technol, ETS, Montreal, PQ, Canada
关键词
fuzzy logic; imputation; missing data; software development effort estimation; COST ESTIMATION; DATA SETS; ANALOGY; ALGORITHM; SYSTEMS;
D O I
10.1002/smr.2529
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
ContextSoftware development effort estimation (SDEE) is one of the most challenging aspects in project management. The presence of missing data (MD) in software attributes makes SDEE even more complex. K-nearest neighbors imputation (KNNI) has been widely used in SDEE to deal with the MD issue. However, KNNI, in its classical process, has low tolerance to imprecision and uncertainty especially when dealing with categorical features. When dealing with categorical attributes, KNNI uses a classical approach, employing mainly numbers or classical intervals to represent software attributes and similarity measures originally designed for numerical attributes. ObjectivesThis paper evaluates the use of an optimized fuzzy clustering-based KNNI (FC-KNNI) and compares it with classical KNN when dealing with mixed data in the context of SDEE. MethodsWe investigate the effect of two imputation techniques (FC-KNNI and KNNI) on five SDEE techniques: case-based reasoning, fuzzy case-based reasoning, support vector regression, multilayer perceptron, and reduced-error pruning tree. The evaluation is carried out using six publicly available datasets for SDEE using two performance measures, standardized accuracy (SA), and Pred (0.25). The Wilcoxon statistical test is also performed to assess the significance of results. ResultsThe results are promising in the sense that using an imputation technique designed for mixed data is better than reusing methods originally designed for numerical data. We found that FC-KNNI significantly outperforms KNNI regardless of the SDEE technique and dataset used. Another important finding is that F-CBR improved the analogy process compared to CBR. ConclusionThe introduction of fuzzy sets and fuzzy clustering in the analogy process improves its performances in terms of SA and Pred (0.25).
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Software Effort Estimation for Agile Software Development Using a Strategy Based on k-Nearest Neighbors Algorithm
    Rodriguez Sanchez, Eduardo
    Cervantes Maceda, Humberto
    Vazquez Santacruz, Eduardo
    2022 IEEE MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE (ENC), 2022,
  • [2] A novel ranked k-nearest neighbors algorithm for missing data imputation
    Khan, Yasir
    Shah, Said Farooq
    Asim, Syed Muhammad
    JOURNAL OF APPLIED STATISTICS, 2025, 52 (05) : 1103 - 1127
  • [3] On the Use of Weighted k-Nearest Neighbors for Missing Value Imputation
    Lim, Chanhui
    Kim, Dongjae
    KOREAN JOURNAL OF APPLIED STATISTICS, 2015, 28 (01) : 23 - 31
  • [4] ESTIMATION OF THE EFFORT REQUIRED TO DEVELOP A SOFTWARE THROUGH THE K-NEAREST NEIGHBORS METHOD
    Iordan, Anca-Elena
    Covaciu, Florin
    ACTA TECHNICA NAPOCENSIS SERIES-APPLIED MATHEMATICS MECHANICS AND ENGINEERING, 2023, 66 (03): : 327 - 332
  • [5] A Hierarchical Missing Value Imputation Method by Correlation-Based K-Nearest Neighbors
    Liu, Xin
    Lai, Xiaochen
    Zhang, Liyong
    INTELLIGENT SYSTEMS AND APPLICATIONS, VOL 1, 2020, 1037 : 486 - 496
  • [6] K-Nearest Neighbor (K-NN) based Missing Data Imputation
    Murti, Della Murbarani Prawidya
    Wibawa, Aji Prasetya
    Akbar, Muhammad Iqbal
    Ianto, Utomo Puj
    2019 5TH INTERNATIONAL CONFERENCE ON SCIENCE ININFORMATION TECHNOLOGY (ICSITECH): EMBRACING INDUSTRY 4.0 - TOWARDS INNOVATION IN CYBER PHYSICAL SYSTEM, 2019, : 83 - 88
  • [7] Towards efficient imputation by nearest-neighbors: A clustering-based approach
    Hruschka, ER
    Hruschka, ER
    Ebecken, NFF
    AI 2004: ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2004, 3339 : 513 - 525
  • [8] Differentially Private k-Nearest Neighbor Missing Data Imputation
    Clifton, Chris
    Hanson, Eric J.
    Merrill, Keith
    Merrill, Shawn
    ACM TRANSACTIONS ON PRIVACY AND SECURITY, 2022, 25 (03)
  • [9] Density Peak Clustering Algorithm Based on K-nearest Neighbors and Optimized Allocation Strategy
    Sun L.
    Qin X.-Y.
    Xu J.-C.
    Xue Z.-A.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (04): : 1390 - 1411
  • [10] Estimation of Missing Values Using a Weighted K-Nearest Neighbors Algorithm
    Ling, Wang
    Mei, Fu Dong
    2009 INTERNATIONAL CONFERENCE ON ENVIRONMENTAL SCIENCE AND INFORMATION APPLICATION TECHNOLOGY, VOL III, PROCEEDINGS,, 2009, : 660 - 663