The critical role of evaluation metrics in handling missing data in machine learning

被引:0
|
作者
Atoum, Ibrahim [1 ]
机构
[1] Al Zaytoonah Univ Jordan, Fac Sci & Informat Technol, Dept Artificial Intelligence, Amman, Jordan
关键词
Missing data handling; Machine learning models; Imputation techniques; Data completeness; Model performance evaluation;
D O I
10.21833/ijaas.2025.01.011
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The presence of missing data in machine learning (ML) datasets remains a major challenge in building reliable models. This study explores various strategies to handle missing data and provides a framework to evaluate their effectiveness. The research focuses on commonly used techniques such as zero-filling, deletion, and imputation methods, including mean, median, mode, regression, k-nearest neighbors (KNN), and flagging. To assess these methods, a detailed evaluation framework is proposed, considering factors such as data completeness, model performance, stability, bias, variance, robustness to new data, computational efficiency, and domain-specific needs. This comprehensive approach allows for a thorough comparison of methods, helping to identify the most suitable technique for specific datasets and tasks. The findings highlight the importance of considering the unique features of the dataset and the goals of the analysis when choosing a method. While basic techniques like deletion and zero-filling may be effective in some cases, advanced imputation methods often preserve data quality and improve model accuracy. By applying the proposed evaluation criteria, researchers and practitioners can make better decisions on handling missing data, leading to more accurate, reliable, and adaptable ML models. (c) 2025 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:112 / 124
页数:13
相关论文
共 50 条
  • [21] Evaluation of Robustness Metrics for Defense of Machine Learning Systems
    DeMarchi, J.
    Rijken, R.
    Melrose, J.
    Madahar, B.
    Fumera, G.
    Roli, F.
    Ledda, E.
    Aktas, M.
    Kurth, F.
    Baggenstoss, P.
    Pelzer, B.
    Kanestad, L.
    2023 INTERNATIONAL CONFERENCE ON MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS, ICMCIS, 2023,
  • [22] PROPOSAL FOR HANDLING MISSING DATA
    GLEASON, TC
    STAELIN, R
    PSYCHOMETRIKA, 1975, 40 (02) : 229 - 252
  • [23] Conservative handling of missing data
    Berger, Vance W.
    CONTEMPORARY CLINICAL TRIALS, 2012, 33 (03) : 460 - 460
  • [24] The prevention and handling of the missing data
    Kang, Hyun
    KOREAN JOURNAL OF ANESTHESIOLOGY, 2013, 64 (05) : 402 - 406
  • [25] Approximate Imputation Method for Missing Data in Machine Learning
    Cao W.
    Chu Y.
    Li X.
    1600, Xi'an Jiaotong University (51): : 142 - 148
  • [26] Regularized extreme learning machine for regression with missing data
    Yu, Qi
    Miche, Yoan
    Eirola, Emil
    van Heeswijk, Mark
    Severin, Eric
    Lendasse, Amaury
    NEUROCOMPUTING, 2013, 102 : 45 - 51
  • [27] Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods
    Palanivinayagam, Ashokkumar
    Damasevicius, Robertas
    INFORMATION, 2023, 14 (02)
  • [28] Analysis of Machine Learning Based Imputation of Missing Data
    Rizvi, Syed Tahir Hussain
    Latif, Muhammad Yasir
    Amin, Muhammad Saad
    Telmoudi, Achraf Jabeur
    Shah, Nasir Ali
    CYBERNETICS AND SYSTEMS, 2023,
  • [29] A method of handling missing data in the context of learning Bayesian network structure
    Chen, Chong
    Yu, Hua
    Wang, Juyun
    APPLIED SCIENCE AND PRECISION ENGINEERING INNOVATION, PTS 1 AND 2, 2014, 479-480 : 906 - +
  • [30] A Provenance Meta Learning Framework for Missing Data Handling Methods Selection
    Liu, Qian
    Hauswirth, Manfred
    2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 349 - 358