The impact of imputation quality on machine learning classifiers for datasets with missing values

被引:0
|
作者
Tolou Shadbahr
Michael Roberts
Jan Stanczuk
Julian Gilbey
Philip Teare
Sören Dittmer
Matthew Thorpe
Ramon Viñas Torné
Evis Sala
Pietro Lió
Mishal Patel
Jacobus Preller
James H. F. Rudd
Tuomas Mirtti
Antti Sakari Rannikko
John A. D. Aston
Jing Tang
Carola-Bibiane Schönlieb
机构
[1] University of Helsinki,Research Program in Systems Oncology, Faculty of Medicine
[2] University of Cambridge,Department of Applied Mathematics and Theoretical Physics
[3] Data Science & Artificial Intelligence,Department of Mathematics
[4] AstraZeneca,Department of Computer Science and Technology
[5] ZeTeM,Department of Radiology
[6] University of Bremen,Addenbrooke’s Hospital
[7] University of Manchester,Department of Medicine
[8] University of Cambridge,Department of Pathology
[9] University of Cambridge,Department of Urology
[10] Clinical Pharmacology & Safety Sciences,Department of Pure Mathematics and Mathematical Statistics
[11] AstraZeneca,Faculty of Mathematics
[12] Cambridge University Hospitals NHS Trust,Language Technology Laboratory
[13] University of Cambridge,Population Health and Genomics, School of Medicine
[14] University of Helsinki and Helsinki University Hospital,Department of Biomedical Imaging and Image
[15] iCAN-Digital Precision Cancer Medicine Flagship,guided Therapy
[16] University of Helsinki and Helsinki University Hospital,National Heart and Lung Institute
[17] University of Cambridge,Institute of Astronomy
[18] University of Vienna,undefined
[19] Royal Papworth Hospital,undefined
[20] Cambridge,undefined
[21] Royal Papworth Hospital NHS Foundation Trust,undefined
[22] University of Cambridge,undefined
[23] University of Dundee,undefined
[24] Computational Imaging Research Lab Medical University of Vienna,undefined
[25] Imperial College London,undefined
[26] contextflow GmbH,undefined
[27] University of Cambridge,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Many artificial intelligence (AI) methods aim to classify samples of data into groups, e.g., patients with disease vs. those without. This often requires datasets to be complete, i.e., that all data has been collected for all samples. However, in clinical practice this is often not the case and some data can be missing. One solution is to ‘complete’ the dataset using a technique called imputation to replace those missing values. However, assessing how well the imputation method performs is challenging. In this work, we demonstrate why people should care about imputation, develop a new method for assessing imputation quality, and demonstrate that if we build AI models on poorly imputed data, the model can give different results to those we would hope for. Our findings may improve the utility and quality of AI models in the clinic.
引用
收藏
相关论文
共 50 条
  • [1] The impact of imputation quality on machine learning classifiers for datasets with missing values
    Shadbahr, Tolou
    Roberts, Michael
    Stanczuk, Jan
    Gilbey, Julian
    Teare, Philip
    Dittmer, Soeren
    Thorpe, Matthew
    Torne, Ramon Vinas
    Sala, Evis
    Lio, Pietro
    Patel, Mishal
    Preller, Jacobus
    Rudd, James H. F.
    Mirtti, Tuomas
    Rannikko, Antti Sakari
    Aston, John A. D.
    Tang, Jing
    Schonlieb, Carola-Bibiane
    COMMUNICATIONS MEDICINE, 2023, 3 (01):
  • [2] Imputation of missing values in lipidomic datasets
    Froelich, Nicolas
    Klose, Christian
    Widen, Elisabeth
    Ripatti, Samuli
    Gerl, Mathias J.
    PROTEOMICS, 2024, 24 (15)
  • [3] A Minimal Learning Machine for Datasets with Missing Values
    Paiva Mesquita, Diego P.
    Gomes, Joao Paulo P.
    Souza, Amauri H., Jr.
    NEURAL INFORMATION PROCESSING, PT I, 2015, 9489 : 565 - 572
  • [4] Machine Learning Based Missing Data Imputation in Categorical Datasets
    Ishaq, Muhammad
    Zahir, Sana
    Iftikhar, Laila
    Bulbul, Mohammad Farhad
    Rho, Seungmin
    Lee, Mi Young
    IEEE ACCESS, 2024, 12 : 88332 - 88344
  • [5] Water-Quality Data Imputation with a High Percentage of Missing Values: A Machine Learning Approach
    Rodriguez, Rafael
    Pastorini, Marcos
    Etcheverry, Lorena
    Chreties, Christian
    Fossati, Monica
    Castro, Alberto
    Gorgoglione, Angela
    SUSTAINABILITY, 2021, 13 (11)
  • [6] A Novel Approach for Dealing with Missing Values in Machine Learning Datasets with Discrete Values
    Abu-Soud, Saleh M.
    2019 INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION SCIENCES (ICCIS), 2019, : 118 - 122
  • [7] Mathura (MBI)-A novel imputation measure for imputation of missing values in medical datasets
    Mathura Bai B.
    Mangathayaru N.
    Padmaja Rani B.
    Aljawarneh S.
    Recent Advances in Computer Science and Communications, 2021, 14 (05) : 1358 - 1369
  • [8] APPLICATION OF ASSOCIATION RULES IN MISSING VALUES IMPUTATION IN CATEGORICAL DATASETS
    Kaiser, Jiri
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON MODELLING AND SIMULATION 2010 IN PRAGUE (MS'10 PRAGUE), 2010, : 203 - 206
  • [9] Regression Imputation for Space-Time Datasets with Missing Values
    Plaia, Antonella
    Bondi, Anna Lisa
    DATA ANALYSIS AND CLASSIFICATION, 2010, : 465 - 472
  • [10] A Novel Approach for Imputation of Missing Values for Mining Medical Datasets
    UshaRani, Yelipe
    Sammulal, P.
    2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), 2015, : 721 - 728