The critical role of evaluation metrics in handling missing data in machine learning

被引:0
|
作者
Atoum, Ibrahim [1 ]
机构
[1] Al Zaytoonah Univ Jordan, Fac Sci & Informat Technol, Dept Artificial Intelligence, Amman, Jordan
关键词
Missing data handling; Machine learning models; Imputation techniques; Data completeness; Model performance evaluation;
D O I
10.21833/ijaas.2025.01.011
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The presence of missing data in machine learning (ML) datasets remains a major challenge in building reliable models. This study explores various strategies to handle missing data and provides a framework to evaluate their effectiveness. The research focuses on commonly used techniques such as zero-filling, deletion, and imputation methods, including mean, median, mode, regression, k-nearest neighbors (KNN), and flagging. To assess these methods, a detailed evaluation framework is proposed, considering factors such as data completeness, model performance, stability, bias, variance, robustness to new data, computational efficiency, and domain-specific needs. This comprehensive approach allows for a thorough comparison of methods, helping to identify the most suitable technique for specific datasets and tasks. The findings highlight the importance of considering the unique features of the dataset and the goals of the analysis when choosing a method. While basic techniques like deletion and zero-filling may be effective in some cases, advanced imputation methods often preserve data quality and improve model accuracy. By applying the proposed evaluation criteria, researchers and practitioners can make better decisions on handling missing data, leading to more accurate, reliable, and adaptable ML models. (c) 2025 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:112 / 124
页数:13
相关论文
共 50 条
  • [41] Sample-Based Extreme Learning Machine with Missing Data
    Gao, Hang
    Liu, Xin-Wang
    Peng, Yu-Xing
    Jian, Song-Lei
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2015, 2015
  • [42] Machine-learning-based particle identification with missing data
    Kasak, Milosz
    Deja, Kamil
    Karwowska, Maja
    Jakubowska, Monika
    Graczykowski, Lukasz
    Janik, Malgorzata
    EUROPEAN PHYSICAL JOURNAL C, 2024, 84 (07):
  • [43] Applying Machine Learning and Data Fusion to the "Missing Person" Problem
    Solaiman, K. M. A.
    Sun, Tao
    Nesen, Alina
    Bhargava, Bharat
    Stonebraker, Michael
    COMPUTER, 2022, 55 (06) : 40 - 55
  • [44] MACHINE LEARNING FOR IMPUTING MISSING PHARMACY COSTS IN CLAIMS DATA
    Vojjala, S. K.
    Barron, J.
    Kumar, A.
    Grabner, M.
    Eshete, B.
    Tan, H.
    Willey, V
    VALUE IN HEALTH, 2023, 26 (06) : S5 - S5
  • [45] Addressing Missing Environmental Data via a Machine Learning Scheme
    Tzanis, Chris G.
    Alimissis, Anastasios
    Koutsogiannis, Ioannis
    ATMOSPHERE, 2021, 12 (04)
  • [46] ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Hamdan, Hazlina
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 470 - 476
  • [47] On Handling Missing Values in Data Stream Mining Algorithms Based on the Restricted Boltzmann Machine
    Jaworski, Maciej
    Duda, Piotr
    Rutkowska, Danuta
    Rutkowski, Leszek
    NEURAL INFORMATION PROCESSING, ICONIP 2019, PT V, 2019, 1143 : 347 - 354
  • [48] Handling Missing Data with Markov Boundary
    Mohammed, Azhar
    Nguyen, Dang
    Duong, Bao
    Nichols, Melanie
    Nguyen, Thin
    ADVANCED DATA MINING AND APPLICATIONS (ADMA 2022), PT I, 2022, 13725 : 319 - 333
  • [49] Best Practices for Handling Missing Data
    Srijan, Shukla
    Rajagopalan, Iyer R.
    ANNALS OF SURGICAL ONCOLOGY, 2024, 31 (01) : 12 - 13
  • [50] Handling Missing Data in CGM Records
    Zulj, Sara
    Carvalho, Paulo
    Ribeiro, Rogerio
    Magjarevic, Ratko
    FUTURE TRENDS IN BIOMEDICAL AND HEALTH INFORMATICS AND CYBERSECURITY IN MEDICAL DEVICES, ICBHI 2019, 2020, 74 : 420 - 427