The critical role of evaluation metrics in handling missing data in machine learning

被引：0

作者：

Atoum, Ibrahim ^{[1
]}

机构：

[1] Al Zaytoonah Univ Jordan, Fac Sci & Informat Technol, Dept Artificial Intelligence, Amman, Jordan

来源：

INTERNATIONAL JOURNAL OF ADVANCED AND APPLIED SCIENCES | 2025年 / 12卷 / 01期

关键词：

Missing data handling; Machine learning models; Imputation techniques; Data completeness; Model performance evaluation;

D O I：

10.21833/ijaas.2025.01.011

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

The presence of missing data in machine learning (ML) datasets remains a major challenge in building reliable models. This study explores various strategies to handle missing data and provides a framework to evaluate their effectiveness. The research focuses on commonly used techniques such as zero-filling, deletion, and imputation methods, including mean, median, mode, regression, k-nearest neighbors (KNN), and flagging. To assess these methods, a detailed evaluation framework is proposed, considering factors such as data completeness, model performance, stability, bias, variance, robustness to new data, computational efficiency, and domain-specific needs. This comprehensive approach allows for a thorough comparison of methods, helping to identify the most suitable technique for specific datasets and tasks. The findings highlight the importance of considering the unique features of the dataset and the goals of the analysis when choosing a method. While basic techniques like deletion and zero-filling may be effective in some cases, advanced imputation methods often preserve data quality and improve model accuracy. By applying the proposed evaluation criteria, researchers and practitioners can make better decisions on handling missing data, leading to more accurate, reliable, and adaptable ML models. (c) 2025 The Authors. Published by IASE. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

引用

页码：112 / 124

页数：13

共 50 条

[21] Evaluation of Robustness Metrics for Defense of Machine Learning Systems
DeMarchi, J.
Rijken, R.
Melrose, J.
Madahar, B.
Fumera, G.
Roli, F.
Ledda, E.
Aktas, M.
Kurth, F.
Baggenstoss, P.
Pelzer, B.
Kanestad, L.
2023 INTERNATIONAL CONFERENCE ON MILITARY COMMUNICATIONS AND INFORMATION SYSTEMS, ICMCIS, 2023,
[22] PROPOSAL FOR HANDLING MISSING DATA
GLEASON, TC
STAELIN, R
PSYCHOMETRIKA, 1975, 40 (02) : 229 - 252
[23] Conservative handling of missing data
Berger, Vance W.
CONTEMPORARY CLINICAL TRIALS, 2012, 33 (03) : 460 - 460
[24] The prevention and handling of the missing data
Kang, Hyun
KOREAN JOURNAL OF ANESTHESIOLOGY, 2013, 64 (05) : 402 - 406
[25] Approximate Imputation Method for Missing Data in Machine Learning
Cao W.
Chu Y.
Li X.
1600, Xi'an Jiaotong University (51): : 142 - 148
[26] Regularized extreme learning machine for regression with missing data
Yu, Qi
Miche, Yoan
Eirola, Emil
van Heeswijk, Mark
Severin, Eric
Lendasse, Amaury
NEUROCOMPUTING, 2013, 102 : 45 - 51
[27] Effective Handling of Missing Values in Datasets for Classification Using Machine Learning Methods
Palanivinayagam, Ashokkumar
Damasevicius, Robertas
INFORMATION, 2023, 14 (02)
[28] Analysis of Machine Learning Based Imputation of Missing Data
Rizvi, Syed Tahir Hussain
Latif, Muhammad Yasir
Amin, Muhammad Saad
Telmoudi, Achraf Jabeur
Shah, Nasir Ali
CYBERNETICS AND SYSTEMS, 2023,
[29] A method of handling missing data in the context of learning Bayesian network structure
Chen, Chong
Yu, Hua
Wang, Juyun
APPLIED SCIENCE AND PRECISION ENGINEERING INNOVATION, PTS 1 AND 2, 2014, 479-480 : 906 - +
[30] A Provenance Meta Learning Framework for Missing Data Handling Methods Selection
Liu, Qian
Hauswirth, Manfred
2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 349 - 358

← 1 2 3 4 5 →