The impact of imputation quality on machine learning classifiers for datasets with missing values

被引:0
|
作者
Tolou Shadbahr
Michael Roberts
Jan Stanczuk
Julian Gilbey
Philip Teare
Sören Dittmer
Matthew Thorpe
Ramon Viñas Torné
Evis Sala
Pietro Lió
Mishal Patel
Jacobus Preller
James H. F. Rudd
Tuomas Mirtti
Antti Sakari Rannikko
John A. D. Aston
Jing Tang
Carola-Bibiane Schönlieb
机构
[1] University of Helsinki,Research Program in Systems Oncology, Faculty of Medicine
[2] University of Cambridge,Department of Applied Mathematics and Theoretical Physics
[3] Data Science & Artificial Intelligence,Department of Mathematics
[4] AstraZeneca,Department of Computer Science and Technology
[5] ZeTeM,Department of Radiology
[6] University of Bremen,Addenbrooke’s Hospital
[7] University of Manchester,Department of Medicine
[8] University of Cambridge,Department of Pathology
[9] University of Cambridge,Department of Urology
[10] Clinical Pharmacology & Safety Sciences,Department of Pure Mathematics and Mathematical Statistics
[11] AstraZeneca,Faculty of Mathematics
[12] Cambridge University Hospitals NHS Trust,Language Technology Laboratory
[13] University of Cambridge,Population Health and Genomics, School of Medicine
[14] University of Helsinki and Helsinki University Hospital,Department of Biomedical Imaging and Image
[15] iCAN-Digital Precision Cancer Medicine Flagship,guided Therapy
[16] University of Helsinki and Helsinki University Hospital,National Heart and Lung Institute
[17] University of Cambridge,Institute of Astronomy
[18] University of Vienna,undefined
[19] Royal Papworth Hospital,undefined
[20] Cambridge,undefined
[21] Royal Papworth Hospital NHS Foundation Trust,undefined
[22] University of Cambridge,undefined
[23] University of Dundee,undefined
[24] Computational Imaging Research Lab Medical University of Vienna,undefined
[25] Imperial College London,undefined
[26] contextflow GmbH,undefined
[27] University of Cambridge,undefined
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Many artificial intelligence (AI) methods aim to classify samples of data into groups, e.g., patients with disease vs. those without. This often requires datasets to be complete, i.e., that all data has been collected for all samples. However, in clinical practice this is often not the case and some data can be missing. One solution is to ‘complete’ the dataset using a technique called imputation to replace those missing values. However, assessing how well the imputation method performs is challenging. In this work, we demonstrate why people should care about imputation, develop a new method for assessing imputation quality, and demonstrate that if we build AI models on poorly imputed data, the model can give different results to those we would hope for. Our findings may improve the utility and quality of AI models in the clinic.
引用
收藏
相关论文
共 50 条
  • [41] Comparative Analysis of Machine Learning Classifiers on Bioinformatics and Clinical Datasets
    Ranadive, Falguni
    Surti, Akil
    Sharma, Priyanka
    PROCEEDINGS OF THE 2019 6TH INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT (INDIACOM), 2019, : 608 - 611
  • [42] Simple Baseline Machine Learning Text Classifiers for Small Datasets
    Riekert M.
    Riekert M.
    Klein A.
    SN Computer Science, 2021, 2 (3)
  • [43] ExtraImpute: A Novel Machine Learning Method for Missing Data Imputation
    Alabadla, Mustafa
    Sidi, Fatimah
    Ishak, Iskandar
    Ibrahim, Hamidah
    Affendey, Lilly Suriani
    Hamdan, Hazlina
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (05) : 470 - 476
  • [44] Missing value imputation using unsupervised machine learning techniques
    Raja, P. S.
    Thangavel, K.
    SOFT COMPUTING, 2020, 24 (06) : 4361 - 4392
  • [45] Missing value imputation using unsupervised machine learning techniques
    P. S. Raja
    K. Thangavel
    Soft Computing, 2020, 24 : 4361 - 4392
  • [46] Performance Analysis of Machine Learning Algorithms for Missing Value Imputation
    Abidin, Nadzurah Zainal
    Ismail, Amelia Ritahani
    Emran, Nurul A.
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (06) : 442 - 447
  • [47] Imputation-Boosted Collaborative Filtering Using Machine Learning Classifiers
    Su, Xiaoyuan
    Khoshgoftaar, Taghi M.
    Zhu, Xingquan
    Greiner, Russell
    APPLIED COMPUTING 2008, VOLS 1-3, 2008, : 949 - +
  • [48] MACHINE LEARNING CLASSIFIERS, META CLASSIFIERS COMPARISON AND ANALYSIS ON BREAST CANCER AND DIABETES DATASETS
    Vidushi
    Agarwal, Manisha
    ADVANCES AND APPLICATIONS IN MATHEMATICAL SCIENCES, 2020, 19 (10): : 1017 - 1028
  • [49] A Comparison of Various Imputation Methods for Missing Values in Air Quality Data
    Zainuri, Nuryazmin Ahmat
    Jemain, Abdul Aziz
    Muda, Nora
    SAINS MALAYSIANA, 2015, 44 (03): : 449 - 456
  • [50] Missing the missing values: The ugly duckling of fairness in machine learning
    Fernando, Martinez-Plumed
    Cesar, Ferri
    David, Nieves
    Jose, Hernandez-Orallo
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (07) : 3217 - 3258