The critical role of evaluation metrics in handling missing data in machine learning

被引:0
|
作者
Atoum, Ibrahim [1 ]
机构
[1] Al Zaytoonah Univ Jordan, Fac Sci & Informat Technol, Dept Artificial Intelligence, Amman, Jordan
关键词
Missing data handling; Machine learning models; Imputation techniques; Data completeness; Model performance evaluation;
D O I
10.21833/ijaas.2025.01.011
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The presence of missing data in machine learning (ML) datasets remains a major challenge in building reliable models. This study explores various strategies to handle missing data and provides a framework to evaluate their effectiveness. The research focuses on commonly used techniques such as zero-filling, deletion, and imputation methods, including mean, median, mode, regression, k-nearest neighbors (KNN), and flagging. To assess these methods, a detailed evaluation framework is proposed, considering factors such as data completeness, model performance, stability, bias, variance, robustness to new data, computational efficiency, and domain-specific needs. This comprehensive approach allows for a thorough comparison of methods, helping to identify the most suitable technique for specific datasets and tasks. The findings highlight the importance of considering the unique features of the dataset and the goals of the analysis when choosing a method. While basic techniques like deletion and zero-filling may be effective in some cases, advanced imputation methods often preserve data quality and improve model accuracy. By applying the proposed evaluation criteria, researchers and practitioners can make better decisions on handling missing data, leading to more accurate, reliable, and adaptable ML models. (c) 2025 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:112 / 124
页数:13
相关论文
共 50 条
  • [1] Missing values handling for machine learning portfolios
    Chen, Andrew Y.
    McCoy, Jack
    JOURNAL OF FINANCIAL ECONOMICS, 2024, 155
  • [2] Active Learning for Handling Missing Data
    Tharwat, Alaa
    Schenck, Wolfram
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (02) : 3273 - 3287
  • [3] A Machine Learning Approach to Mental Disorder Prediction: Handling the Missing Data Challenge
    Mokheleli, Tsholofelo
    Bokaba, Tebogo
    Museba, Tinofirei
    Ntshingila, Nompumelelo
    EMERGING TECHNOLOGIES FOR DEVELOPING COUNTRIES, AFRICATEK 2023, 2024, 520 : 93 - 106
  • [4] A Comparative Study on Missing Data Handling Using Machine Learning for Human Activity Recognition
    Hossain, Tahera
    Inoue, Sozo
    2019 JOINT 8TH INTERNATIONAL CONFERENCE ON INFORMATICS, ELECTRONICS & VISION (ICIEV) AND 2019 3RD INTERNATIONAL CONFERENCE ON IMAGING, VISION & PATTERN RECOGNITION (ICIVPR) WITH INTERNATIONAL CONFERENCE ON ACTIVITY AND BEHAVIOR COMPUTING (ABC), 2019, : 124 - 129
  • [5] Handling high-dimensional data with missing values by modern machine learning techniques
    Chen, Sixia
    Xu, Chao
    JOURNAL OF APPLIED STATISTICS, 2023, 50 (03) : 786 - 804
  • [6] Missing Data Handling using Machine Learning for Human Activity Recognition on Mobile Device
    Prabowo, Okyza M.
    Mutijarsa, Kusprasapta
    Supangkat, Suhono Harso
    2016 INTERNATIONAL CONFERENCE ON ICT FOR SMART SOCIETY (ICISS), 2016, : 59 - 62
  • [7] Handling Missing Data with Graph Representation Learning
    You, Jiaxuan
    Ma, Xiaobai
    Ding, Daisy Yi
    Kochenderfer, Mykel
    Leskovec, Jure
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Evaluation of Machine Learning Classification Algorithms & Missing Data Imputation Techniques
    Nwulu, Nnamdi I.
    2017 INTERNATIONAL ARTIFICIAL INTELLIGENCE AND DATA PROCESSING SYMPOSIUM (IDAP), 2017,
  • [9] A survey on missing data in machine learning
    Tlamelo Emmanuel
    Thabiso Maupong
    Dimane Mpoeleng
    Thabo Semong
    Banyatsang Mphago
    Oteng Tabona
    Journal of Big Data, 8
  • [10] A survey on missing data in machine learning
    Emmanuel, Tlamelo
    Maupong, Thabiso
    Mpoeleng, Dimane
    Semong, Thabo
    Mphago, Banyatsang
    Tabona, Oteng
    JOURNAL OF BIG DATA, 2021, 8 (01)