Imputation of missing clinical, cognitive and neuroimaging data of Dementia using missForest, a Random Forest based algorithm

被引:5
作者
Aracri, Federica [1 ]
Bianco, Maria Giovanna [1 ]
Quattrone, Andrea [1 ]
Sarica, Alessia [1 ]
机构
[1] Magna Graecia Univ Catanzaro, Neurosci Res Ctr, Dept Med & Surg, Catanzaro, Italy
来源
2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS | 2023年
基金
加拿大健康研究院; 美国国家卫生研究院;
关键词
Imputation; MissForest algorithm; Mean imputation algorithm; ADNI dataset; Alzheimer's disease;
D O I
10.1109/CBMS58004.2023.00300
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing value issue is often encountered in international Neuroscience and Neuroimaging databases. As many statistical methods and Machine Learning (ML) algorithms are not designed to work with missing data, usually all variables associated with these records are removed, losing information and negatively affecting performance of neurodegenerative diseases classification such as Dementia. A reliable alternative is to employ imputation to substitute missing values, for example with the mean (I-mean), which is widely applied. Recently, missForest (MF), a Random Forest based algorithm - became popular for handling missing data in biomedical research. Thus, we aimed at assessing the reliability of MF in solving the missingness problem in a cohort of Mild Cognitive Impairment (MCI) and Alzheimer's disease (AD) patients from international database Alzheimer's Disease Neuroimaging Initiative (ADNI), with clinical, cognitive and neuroimaging features. First, we amputed the complete dataset with increasing percentage of missing data (from 10% to 80%) by applying Missing Completely At Random (MCAR). Then, we used I-mean and MF approaches on amputed datasets and we compared their imputation error (RSME, NRSME, MAE). When average error on all features was considered, MF showed better performance than I-mean in each amputation percentage. However, when comparing error on single features, MF had slight performance decrease compared with I-mean on cognitive features ADAS, RAVLT and MMSE, regardless of the amputation percentage. We conclude that missForest resulted to be a reliable imputation algorithm for handling missing neuroscience data, although it should be used with caution on highly skewed variables, such as cognitive scores.
引用
收藏
页码:684 / 688
页数:5
相关论文
共 21 条
  • [1] Cortical atrophy distinguishes idiopathic normal-pressure hydrocephalus from progressive supranuclear palsy: A machine learning approach
    Bianco, Maria Giovanna
    Quattrone, Andrea
    Sarica, Alessia
    Vescio, Basilio
    Buonocore, Jolanda
    Vaccaro, Maria Grazia
    Aracri, Federica
    Calomino, Camilla
    Gramigna, Vera
    Quattrone, Aldo
    [J]. PARKINSONISM & RELATED DISORDERS, 2022, 103 : 7 - 14
  • [2] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [3] Evaluating Imputation Techniques for Missing Data in ADNI: A Patient Classification Study
    Campos, Sergio
    Pizarro, Luis
    Valle, Carlos
    Gray, Katherine R.
    Rueckert, Daniel
    Allende, Hector
    [J]. PROGRESS IN PATTERN RECOGNITION, IMAGE ANALYSIS, COMPUTER VISION, AND APPLICATIONS, CIARP 2015, 2015, 9423 : 3 - 10
  • [4] Cihan P., 2019, SAK U J SCI, V23
  • [5] Accuracy of random-forest-based imputation of missing data in the presence of non-normality, non-linearity, and interaction
    Hong, Shangzhi
    Lynn, Henry S.
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2020, 20 (01)
  • [6] Comparison of Performance of Data Imputation Methods for Numeric Dataset
    Jadhav, Anil
    Pramod, Dhanya
    Ramanathan, Krishnan
    [J]. APPLIED ARTIFICIAL INTELLIGENCE, 2019, 33 (10) : 913 - 933
  • [7] missForest with feature selection using binary particle swarm optimization improves the imputation accuracy of continuous data
    Jin, Heejin
    Jung, Surin
    Won, Sungho
    [J]. GENES & GENOMICS, 2022, 44 (06) : 651 - 658
  • [8] Little RJ, 2019, STAT ANAL MISSING DA, V793
  • [9] Practical Strategies for Extreme Missing Data Imputation in Dementia Diagnosis
    McCombe, Niamh
    Liu, Shuo
    Ding, Xuemei
    Prasad, Girijesh
    Bucholc, Magda
    Finn, David P.
    Todd, Stephen
    McClean, Paula L.
    Wong-Lin, Kongfatt
    [J]. IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2022, 26 (02) : 818 - 827
  • [10] Missing data and multiple imputation in clinical epidemiological research
    Pedersen, Alma B.
    Mikkelsen, Ellen M.
    Cronin-Fenton, Deirdre
    Kristensen, Nickolaj R.
    Tra My Pham
    Pedersen, Lars
    Petersen, Irene
    [J]. CLINICAL EPIDEMIOLOGY, 2017, 9 : 157 - 165