The impact of imputation quality on machine learning classifiers for datasets with missing values

被引:0
|
作者
Tolou Shadbahr
Michael Roberts
Jan Stanczuk
Julian Gilbey
Philip Teare
Sören Dittmer
Matthew Thorpe
Ramon Viñas Torné
Evis Sala
Pietro Lió
Mishal Patel
Jacobus Preller
James H. F. Rudd
Tuomas Mirtti
Antti Sakari Rannikko
John A. D. Aston
Jing Tang
Carola-Bibiane Schönlieb
机构
[1] University of Helsinki,Research Program in Systems Oncology, Faculty of Medicine
[2] University of Cambridge,Department of Applied Mathematics and Theoretical Physics
[3] Data Science & Artificial Intelligence,Department of Mathematics
[4] AstraZeneca,Department of Computer Science and Technology
[5] ZeTeM,Department of Radiology
[6] University of Bremen,Addenbrooke’s Hospital
[7] University of Manchester,Department of Medicine
[8] University of Cambridge,Department of Pathology
[9] University of Cambridge,Department of Urology
[10] Clinical Pharmacology & Safety Sciences,Department of Pure Mathematics and Mathematical Statistics
[11] AstraZeneca,Faculty of Mathematics
[12] Cambridge University Hospitals NHS Trust,Language Technology Laboratory
[13] University of Cambridge,Population Health and Genomics, School of Medicine
[14] University of Helsinki and Helsinki University Hospital,Department of Biomedical Imaging and Image
[15] iCAN-Digital Precision Cancer Medicine Flagship,guided Therapy
[16] University of Helsinki and Helsinki University Hospital,National Heart and Lung Institute
[17] University of Cambridge,Institute of Astronomy
[18] University of Vienna,undefined
[19] Royal Papworth Hospital,undefined
[20] Cambridge,undefined
[21] Royal Papworth Hospital NHS Foundation Trust,undefined
[22] University of Cambridge,undefined
[23] University of Dundee,undefined
[24] Computational Imaging Research Lab Medical University of Vienna,undefined
[25] Imperial College London,undefined
[26] contextflow GmbH,undefined
[27] University of Cambridge,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Many artificial intelligence (AI) methods aim to classify samples of data into groups, e.g., patients with disease vs. those without. This often requires datasets to be complete, i.e., that all data has been collected for all samples. However, in clinical practice this is often not the case and some data can be missing. One solution is to ‘complete’ the dataset using a technique called imputation to replace those missing values. However, assessing how well the imputation method performs is challenging. In this work, we demonstrate why people should care about imputation, develop a new method for assessing imputation quality, and demonstrate that if we build AI models on poorly imputed data, the model can give different results to those we would hope for. Our findings may improve the utility and quality of AI models in the clinic.
引用
收藏
相关论文
共 50 条
  • [31] Single and Multiple Imputation Method to Replace Missing Values in Air Pollution Datasets: A Review
    Libasin, Zuraira
    Ul-Saufie, Ahmad Zia
    Ahmat, Hasfazilah
    Shaziayani, Wan Nur
    2ND INTERNATIONAL CONFERENCE ON GREEN ENVIRONMENTAL ENGINEERING AND TECHNOLOGY, 2020, 616
  • [32] Advanced methods for missing values imputation based on similarity learning
    Fouad, Khaled M.
    Ismail, Mahmoud M.
    Azar, Ahmad Taher
    Arafa, Mona M.
    PEERJ COMPUTER SCIENCE, 2021, 7
  • [33] Sequential imputation for missing values
    Verboven, Sabine
    Branden, Karlien Vanden
    Goos, Peter
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2007, 31 (5-6) : 320 - 327
  • [34] Multiple imputation of missing values
    Royston, Patrick
    STATA JOURNAL, 2004, 4 (03): : 227 - 241
  • [35] Advanced methods for missing values imputation based on similarity learning
    Fouad K.M.
    Ismail M.M.
    Azar A.T.
    Arafa M.M.
    Ismail, Mahmoud M. (mahmoud.ismael@fci.bu.edu.eg), 1600, PeerJ Inc. (07): : 1 - 38
  • [36] Machine-Learning-Based Imputation Method for Filling Missing Values in Ground Meteorological Observation Data
    Li, Cong
    Ren, Xupeng
    Zhao, Guohui
    ALGORITHMS, 2023, 16 (09)
  • [37] Impact of imputation of missing values on classification error for discrete data
    Farhangfar, Alireza
    Kurgan, Lukasz
    Dy, Jennifer
    PATTERN RECOGNITION, 2008, 41 (12) : 3692 - 3705
  • [38] Impact of machine learning-based imputation techniques on medical datasets- a comparative analysis
    Tiwaskar S.
    Rashid M.
    Gokhale P.
    Multimedia Tools and Applications, 2025, 84 (09) : 5905 - 5925
  • [39] Missing Data Imputation using Machine Learning Algorithm for Supervised Learning
    Cenitta, D.
    Arjunan, R. Vijaya
    Prema, K., V
    2021 INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND INFORMATICS (ICCCI), 2021,
  • [40] Neural Models for Imputation of Missing Ozone Data in Air-Quality Datasets
    Arroyo, Angel
    Herrero, Alvaro
    Tricio, Veronica
    Corchado, Emilio
    Wozniak, Michal
    COMPLEXITY, 2018,