The impact of imputation quality on machine learning classifiers for datasets with missing values

被引:0
|
作者
Tolou Shadbahr
Michael Roberts
Jan Stanczuk
Julian Gilbey
Philip Teare
Sören Dittmer
Matthew Thorpe
Ramon Viñas Torné
Evis Sala
Pietro Lió
Mishal Patel
Jacobus Preller
James H. F. Rudd
Tuomas Mirtti
Antti Sakari Rannikko
John A. D. Aston
Jing Tang
Carola-Bibiane Schönlieb
机构
[1] University of Helsinki,Research Program in Systems Oncology, Faculty of Medicine
[2] University of Cambridge,Department of Applied Mathematics and Theoretical Physics
[3] Data Science & Artificial Intelligence,Department of Mathematics
[4] AstraZeneca,Department of Computer Science and Technology
[5] ZeTeM,Department of Radiology
[6] University of Bremen,Addenbrooke’s Hospital
[7] University of Manchester,Department of Medicine
[8] University of Cambridge,Department of Pathology
[9] University of Cambridge,Department of Urology
[10] Clinical Pharmacology & Safety Sciences,Department of Pure Mathematics and Mathematical Statistics
[11] AstraZeneca,Faculty of Mathematics
[12] Cambridge University Hospitals NHS Trust,Language Technology Laboratory
[13] University of Cambridge,Population Health and Genomics, School of Medicine
[14] University of Helsinki and Helsinki University Hospital,Department of Biomedical Imaging and Image
[15] iCAN-Digital Precision Cancer Medicine Flagship,guided Therapy
[16] University of Helsinki and Helsinki University Hospital,National Heart and Lung Institute
[17] University of Cambridge,Institute of Astronomy
[18] University of Vienna,undefined
[19] Royal Papworth Hospital,undefined
[20] Cambridge,undefined
[21] Royal Papworth Hospital NHS Foundation Trust,undefined
[22] University of Cambridge,undefined
[23] University of Dundee,undefined
[24] Computational Imaging Research Lab Medical University of Vienna,undefined
[25] Imperial College London,undefined
[26] contextflow GmbH,undefined
[27] University of Cambridge,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Many artificial intelligence (AI) methods aim to classify samples of data into groups, e.g., patients with disease vs. those without. This often requires datasets to be complete, i.e., that all data has been collected for all samples. However, in clinical practice this is often not the case and some data can be missing. One solution is to ‘complete’ the dataset using a technique called imputation to replace those missing values. However, assessing how well the imputation method performs is challenging. In this work, we demonstrate why people should care about imputation, develop a new method for assessing imputation quality, and demonstrate that if we build AI models on poorly imputed data, the model can give different results to those we would hope for. Our findings may improve the utility and quality of AI models in the clinic.
引用
收藏
相关论文
共 50 条
  • [21] ILA4: Overcoming missing values in machine learning datasets - An inductive learning approach
    Elhassan, Ammar
    Abu-Soud, Saleh M.
    Alghanim, Firas
    Salameh, Walid
    JOURNAL OF KING SAUD UNIVERSITY-COMPUTER AND INFORMATION SCIENCES, 2022, 34 (07) : 4284 - 4295
  • [22] Active learning with missing values considering imputation uncertainty
    Han, Jongmin
    Kang, Seokho
    KNOWLEDGE-BASED SYSTEMS, 2021, 224
  • [23] Semi-supervised learning with missing values imputation
    Huang, Buliao
    Zhu, Yunhui
    Usman, Muhammad
    Chen, Huanhuan
    KNOWLEDGE-BASED SYSTEMS, 2024, 284
  • [24] Proactive missing values imputation based on reinforcement learning
    Fountas, Panagiotis
    Kolomvatsos, Kostas
    COMPUTING, 2025, 107 (04)
  • [25] A Comparison of Machine Learning Classifiers Applied to Financial Datasets
    Robles-Granda, Pablo D.
    Belik, Ivan V.
    WORLD CONGRESS ON ENGINEERING AND COMPUTER SCIENCE, VOLS 1 AND 2, 2010, : 454 - 459
  • [26] Analysis of Machine Learning Based Imputation of Missing Data
    Rizvi, Syed Tahir Hussain
    Latif, Muhammad Yasir
    Amin, Muhammad Saad
    Telmoudi, Achraf Jabeur
    Shah, Nasir Ali
    CYBERNETICS AND SYSTEMS, 2023,
  • [27] Machine learning imputation of missing Mesonet temperature observations
    Boomgard-Zagrodnik, Joseph P.
    Brown, David J.
    COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2022, 192
  • [28] Approximate Imputation Method for Missing Data in Machine Learning
    Cao W.
    Chu Y.
    Li X.
    1600, Xi'an Jiaotong University (51): : 142 - 148
  • [29] Methods for imputation of missing values in air quality data sets
    Junninen, H
    Niska, H
    Tuppurainen, K
    Ruuskanen, J
    Kolehmainen, M
    ATMOSPHERIC ENVIRONMENT, 2004, 38 (18) : 2895 - 2907
  • [30] Machine Learning Based Method for Insurance Fraud Detection on Class Imbalance Datasets With Missing Values
    Khalil, Ahmed A.
    Liu, Zaiming
    Fathalla, Ahmed
    Ali, Ahmed
    Salah, Ahmad
    IEEE ACCESS, 2024, 12 : 155451 - 155468