The critical role of evaluation metrics in handling missing data in machine learning

被引:0
|
作者
Atoum, Ibrahim [1 ]
机构
[1] Al Zaytoonah Univ Jordan, Fac Sci & Informat Technol, Dept Artificial Intelligence, Amman, Jordan
关键词
Missing data handling; Machine learning models; Imputation techniques; Data completeness; Model performance evaluation;
D O I
10.21833/ijaas.2025.01.011
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The presence of missing data in machine learning (ML) datasets remains a major challenge in building reliable models. This study explores various strategies to handle missing data and provides a framework to evaluate their effectiveness. The research focuses on commonly used techniques such as zero-filling, deletion, and imputation methods, including mean, median, mode, regression, k-nearest neighbors (KNN), and flagging. To assess these methods, a detailed evaluation framework is proposed, considering factors such as data completeness, model performance, stability, bias, variance, robustness to new data, computational efficiency, and domain-specific needs. This comprehensive approach allows for a thorough comparison of methods, helping to identify the most suitable technique for specific datasets and tasks. The findings highlight the importance of considering the unique features of the dataset and the goals of the analysis when choosing a method. While basic techniques like deletion and zero-filling may be effective in some cases, advanced imputation methods often preserve data quality and improve model accuracy. By applying the proposed evaluation criteria, researchers and practitioners can make better decisions on handling missing data, leading to more accurate, reliable, and adaptable ML models. (c) 2025 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:112 / 124
页数:13
相关论文
共 50 条
  • [31] Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies
    Kieu Trinh Do
    Wahl, Simone
    Raffler, Johannes
    Molnos, Sophie
    Laimighofer, Michael
    Adamski, Jerzy
    Suhre, Karsten
    Strauch, Konstantin
    Peters, Annette
    Gieger, Christian
    Langenberg, Claudia
    Stewart, Isobel D.
    Theis, Fabian J.
    Grallert, Harald
    Kastenmueller, Gabi
    Krumsiek, Jan
    METABOLOMICS, 2018, 14 (10)
  • [32] Characterization of missing values in untargeted MS-based metabolomics data and evaluation of missing data handling strategies
    Kieu Trinh Do
    Simone Wahl
    Johannes Raffler
    Sophie Molnos
    Michael Laimighofer
    Jerzy Adamski
    Karsten Suhre
    Konstantin Strauch
    Annette Peters
    Christian Gieger
    Claudia Langenberg
    Isobel D. Stewart
    Fabian J. Theis
    Harald Grallert
    Gabi Kastenmüller
    Jan Krumsiek
    Metabolomics, 2018, 14
  • [33] Missing Data Imputation using Machine Learning Algorithm for Supervised Learning
    Cenitta, D.
    Arjunan, R. Vijaya
    Prema, K., V
    2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [34] Handling Missing Data in Vaccine Clinical Trials for Immunogenicity and Safety Evaluation
    Li, Xiaoming
    Wang, William W. B.
    Liu, Guanghan F.
    Chan, Ivan S. F.
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2011, 21 (02) : 294 - 310
  • [35] A critical evaluation of handling uncertainty in Big Data processing
    Upadhyay, Ekansh
    ADVANCES IN ENGINEERING SOFTWARE, 2022, 173
  • [36] Discrepancies and error evaluation metrics for machine learning interatomic potentials
    Liu, Yunsheng
    He, Xingfeng
    Mo, Yifei
    NPJ COMPUTATIONAL MATERIALS, 2023, 9 (01)
  • [37] Discrepancies and error evaluation metrics for machine learning interatomic potentials
    Yunsheng Liu
    Xingfeng He
    Yifei Mo
    npj Computational Materials, 9
  • [38] A review of model evaluation metrics for machine learning in genetics and genomics
    Miller, Catriona
    Portlock, Theo
    Nyaga, Denis M.
    O'Sullivan, Justin M.
    FRONTIERS IN BIOINFORMATICS, 2024, 4
  • [39] Extreme learning machine for missing data using multiple imputations
    Sovilj, Dusan
    Eirola, Emil
    Miche, Yoan
    Bjork, Kaj-Mikael
    Nian, Rui
    Akusok, Anton
    Lendasse, Amaury
    NEUROCOMPUTING, 2016, 174 : 220 - 231
  • [40] Machine Learning Based Missing Data Imputation in Categorical Datasets
    Ishaq, Muhammad
    Zahir, Sana
    Iftikhar, Laila
    Bulbul, Mohammad Farhad
    Rho, Seungmin
    Lee, Mi Young
    IEEE ACCESS, 2024, 12 : 88332 - 88344