Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study

被引:27
作者
Campos, Sergio [1 ]
Pizarro, Luis [3 ]
Valle, Carlos [1 ]
Gray, Katherine R. [2 ]
Rueckert, Daniel [2 ]
Allende, Hector [1 ]
机构
[1] Univ Tecn Federico Santa Maria, Dept Informat, Valparaiso, Chile
[2] UCL, Dept Comp Sci, London, England
[3] Univ London Imperial Coll Sci Technol & Med, Dept Comp, London, England
来源
PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015 | 2015年 / 9423卷
关键词
Missing data; Imputation; Classification; ADNI; Alzheimer; VALUES;
D O I
10.1007/978-3-319-25751-8_1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In real-world applications it is common to find data sets whose records contain missing values. As many data analysis algorithms are not designed to work with missing data, all variables associated with such records are generally removed from the analysis. A better alternative is to employ data imputation techniques to estimate the missing values using statistical relationships among the variables. In this work, we test the most common imputation methods used in the literature for filling missing records in the ADNI (Alzheimer's Disease Neuroimaging Initiative) data set, which affects about 80% of the patients-making unwise the removal of most of the data. We measure the imputation error of the different techniques and then evaluate their impact on classification performance. We train support vector machine and random forest classifiers using all the imputed data as opposed to a reduced set of samples having complete records, for the task of discriminating among different stages of the Alzheimer's disease. Our results show the importance of using imputation procedures to achieve higher accuracy and robustness in the classification.
引用
收藏
页码:3 / 10
页数:8
相关论文
共 17 条
  • [1] [Anonymous], 2018, Robust Statistics: Theory and Methods
  • [2] A survey of cross-validation procedures for model selection
    Arlot, Sylvain
    Celisse, Alain
    [J]. STATISTICS SURVEYS, 2010, 4 : 40 - 79
  • [3] Forecasting the global burden of Alzheimer's disease
    Brookmeyer, Ron
    Johnson, Elizabeth
    Ziegler-Graham, Kathryn
    Arrighi, H. Michael
    [J]. ALZHEIMERS & DEMENTIA, 2007, 3 (03) : 186 - 191
  • [4] Báez PG, 2007, LECT NOTES COMPUT SC, V4881, P898
  • [5] Pattern classification with missing data: a review
    Garcia-Laencina, Pedro J.
    Sancho-Gomez, Jose-Luis
    Figueiras-Vidal, Anibal R.
    [J]. NEURAL COMPUTING & APPLICATIONS, 2010, 19 (02) : 263 - 282
  • [6] Random forest-based similarity measures for multi-modal classification of Alzheimer's disease
    Gray, Katherine R.
    Aljabar, Paul
    Heckemann, Rolf A.
    Hammers, Alexander
    Rueckert, Daniel
    [J]. NEUROIMAGE, 2013, 65 : 167 - 175
  • [7] Ingalhalikar M, 2012, LECT NOTES COMPUT SC, V7512, P468, DOI 10.1007/978-3-642-33454-2_58
  • [8] Little RJA, 2002, STAT ANAL MISSING DA
  • [9] Predicting missing biomarker data in a longitudinal study of Alzheimer disease
    Lo, Raymond Y.
    Jagust, William J.
    [J]. NEUROLOGY, 2012, 78 (18) : 1376 - 1382
  • [10] Handling missing values in support vector machine classifiers
    Pelckmans, K
    De Brabanter, J
    Suykens, JAK
    De Moor, B
    [J]. NEURAL NETWORKS, 2005, 18 (5-6) : 684 - 692