Imputation of Missing Data in Electronic Health Records Based on Patients' Similarities

被引:17
作者
Jazayeri, Ali [1 ]
Liang, Ou Stella [1 ]
Yang, Christopher C. [1 ]
机构
[1] Drexel Univ, Coll Comp & Informat, Philadelphia, PA 19104 USA
基金
美国国家科学基金会;
关键词
Missing data imputation; Electronic health records; Similarity-based imputation; CHAINED EQUATIONS;
D O I
10.1007/s41666-020-00073-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using electronic health records (EHR) as the source of data for mining and analysis of different health conditions has become an increasingly common approach. However, due to irregular observation times and other uncertainties inherent in medical settings, the EHR data sets suffer from a large number of missing values. Most of the traditional data mining and machine learning approaches are designed to operate on complete data. In this paper, we propose a novel imputation method for missing data to facilitate using these approaches for the analysis of EHR data. The imputation is based on a set of interpatient, multivariate similarities among patients. For a missing data point in a patient's lab results during his/her intensive care unit stay, the method ranks other patients based on their similarities with the ego patient in terms of lab values, then the missing value is estimated as a weighted average of the known values of the same laboratory test from other patients, considering their similarities as weights. A comparison of the estimated values by the proposed method with values estimated by several common and state-of-the-are methods, such as MICE and 3D-MICE, shows that the proposed method outperforms them and produces promising results.
引用
收藏
页码:295 / 307
页数:13
相关论文
共 20 条
  • [1] Ajami Sima, 2013, Acta Inform Med, V21, P129, DOI 10.5455/aim.2013.21.129-134
  • [2] Multiple imputation by chained equations: what is it and how does it work?
    Azur, Melissa J.
    Stuart, Elizabeth A.
    Frangakis, Constantine
    Leaf, Philip J.
    [J]. INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, 2011, 20 (01) : 40 - 49
  • [3] Recurrent Neural Networks for Multivariate Time Series with Missing Values
    Che, Zhengping
    Purushotham, Sanjay
    Cho, Kyunghyun
    Sontag, David
    Liu, Yan
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [4] Dhevi ATS, 2014, INT CONF ADV COMPU, P255, DOI 10.1109/ICoAC.2014.7229721
  • [5] A neural network-based framework for the reconstruction of incomplete data sets
    Gheyas, Iffat A.
    Smith, Leslie S.
    [J]. NEUROCOMPUTING, 2010, 73 (16-18) : 3039 - 3065
  • [6] Next-generation phenotyping of electronic health records
    Hripcsak, George
    Albers, David J.
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2013, 20 (01) : 117 - 121
  • [7] Missing data imputation using statistical and machine learning methods in a real breast cancer problem
    Jerez, Jose M.
    Molina, Ignacio
    Garcia-Laencina, Pedro J.
    Alba, Emilio
    Ribelles, Nuria
    Martin, Miguel
    Franco, Leonardo
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2010, 50 (02) : 105 - 115
  • [8] MIMIC-III, a freely accessible critical care database
    Johnson, Alistair E. W.
    Pollard, Tom J.
    Shen, Lu
    Lehman, Li-wei H.
    Feng, Mengling
    Ghassemi, Mohammad
    Moody, Benjamin
    Szolovits, Peter
    Celi, Leo Anthony
    Mark, Roger G.
    [J]. SCIENTIFIC DATA, 2016, 3
  • [9] Personalized Mortality Prediction Driven by Electronic Medical Data and a Patient Similarity Metric
    Lee, Joon
    Maslove, David M.
    Dubin, Joel A.
    [J]. PLOS ONE, 2015, 10 (05):
  • [10] Lipton ZC., 2016, Modeling missing data in clinical time series with rnns, V56, P253