Evaluating Imputation Methods for Missing Data in a MCI Dataset

被引:1
作者
Gomez-Valades Batanero, Alba [1 ]
Rincon Zamorano, Mariano [1 ]
Martinez Tomas, Rafael [1 ]
Guerrero Martin, Juan [1 ]
机构
[1] Univ Nacl Educ Distancia, Madrid 28040, Spain
来源
ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I | 2022年 / 13258卷
关键词
Missing data; Imputation; Multiple imputation; MULTIPLE IMPUTATION; TRIALS;
D O I
10.1007/978-3-031-06242-1_44
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing data is a recurrent problem in experimental studies, mostly in clinical and sociodemographic longitudinal studies due to the dropout and the negative of some subjects to answer or perform some tests. To address this problem different strategies have been designed to deal with missing values, but incorrect treatment of missing data can result in the database being biased in one or more parameters, compromising the viability of the database and future studies. To solve this problem different imputation techniques have been developed over the last decades. However, there are no regulations or clear guidelines to deal with these situations. In this study, we will analyze and impute a real, incomplete database for the early detection of MCI, where the loss of values on 3 main variables is strongly correlated with the years of studies. The imputation will follow two strategies: assuming that those people would have got a bad scoring if they had taken the test, defining a ceiling score, and a multiple imputation by fully conditional specification. To determine if any kind of bias in mean and variance has been introduced during the imputation, the original database was compared with the imputed databases. Taking a p-value = 0.1 threshold, the database imputed by the multiple imputation method is the one that best preserved the information of the original database, making it the more appropriate imputation method for this MCI database.
引用
收藏
页码:446 / 454
页数:9
相关论文
共 15 条
  • [1] MULTIPLE IMPUTATION FOR NONRESPONSE IN SURVEYS - RUBIN,DB
    CAMPION, WM
    [J]. JOURNAL OF MARKETING RESEARCH, 1989, 26 (04) : 485 - 486
  • [2] A comparison of different methods to handle missing data in the context of propensity score analysis
    Choi, Jungyeon
    Dekkers, Olaf M.
    le Cessie, Saskia
    [J]. EUROPEAN JOURNAL OF EPIDEMIOLOGY, 2019, 34 (01) : 23 - 36
  • [3] Dziura James D., 2013, Yale Journal of Biology and Medicine, V86, P343
  • [4] Accuracy of verbal fluency tests in the discrimination of mild cognitive impairment and probable Alzheimer's disease in older Spanish monolingual individuals
    Garcia-Herranz, S.
    Diaz-Mardomingo, M. C.
    Venero, C.
    Peraita, H.
    [J]. AGING NEUROPSYCHOLOGY AND COGNITION, 2020, 27 (06) : 826 - 840
  • [5] Randomized trials with missing outcome data: how to analyze and what to report
    Groenwold, Rolf H. H.
    Moons, Karel G. M.
    Vandenbroucke, Jan P.
    [J]. CANADIAN MEDICAL ASSOCIATION JOURNAL, 2014, 186 (15) : 1153 - 1157
  • [6] Accounting for missing data in statistical analyses: multiple imputation is not always the answer
    Hughes, Rachael A.
    Heron, Jon
    Sterne, Jonathan A. C.
    Tilling, Kate
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2019, 48 (04) : 1294 - 1304
  • [7] When and how should multiple imputation be used for handling missing data in randomised clinical trials - a practical guide with flowcharts
    Jakobsen, Janus Christian
    Gluud, Christian
    Wetterslev, Jorn
    Winkel, Per
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [8] Liu Yang, 2015, Int J Stat Med Res, V4, P287
  • [9] Marlin B.M., 2005, UNSUPERVISED LEARNIN
  • [10] Multiple Imputation: A Review of Practical and Theoretical Findings
    Murray, Jared S.
    [J]. STATISTICAL SCIENCE, 2018, 33 (02) : 142 - 159