Machine Learning Analysis for Data Incompleteness (MADI): Analyzing the Data Completeness of Patient Records Using a Random Variable Approach to Predict the Incompleteness of Electronic Health Records

被引:7
作者
Gurupur, Varadraj P. [1 ]
Shelleh, Muhammed [2 ]
机构
[1] Univ Cent Florida, Dept Hlth Management & Informat, Orlando, FL 32826 USA
[2] Univ Cent Florida, Dept Comp Sci, Orlando, FL 32826 USA
关键词
Histograms; Fitting; Electronic medical records; Data models; Entropy; Machine learning; Random variables; Health informatics; big data models; data completeness; probability density; Kolomogorov-Smirnov test; support vector machine; stochastic gradient descent; generalized additive model; electronic health records; CARE;
D O I
10.1109/ACCESS.2021.3095240
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The purpose of this article is to propose a methodology involving various methods that can be used to predict the data incompleteness of a dataset. Here the investigators have presented data incompleteness as both continuous and discrete random variables. In addition the investigators used transfer entropy for the purpose of advancing the science associated with the analysis of data incompleteness of electronic health records. The underlying methodology has been coined as "Machine Learning Analysis for Data Incompleteness" (MADI) with the intention of developing a possible solution to data incompleteness in electronic health records. MADI advances the analysis of data incompleteness with the use of Kolomogorov Smirnov goodness of fit, mielke distribution, and beta distributions for a holistic analysis. Alongside the methodology presented, the investigators explored stochastic gradient descent, generalized additive models, and support vector machines for comparison. Overall, the investigators have presented a complete set of methods and algorithms to help predict data incompleteness in a medical setting and provided suggestions for practical applications into the prediction of data incompleteness.
引用
收藏
页码:95994 / 96001
页数:8
相关论文
共 33 条
[1]   EXACT BAHADUR EFFICIENCIES FOR KOLMOGOROV-SMIRNOV AND KUIPER 1- AND 2-SAMPLE STATISTICS [J].
ABRAHAMSON, IG .
ANNALS OF MATHEMATICAL STATISTICS, 1967, 38 (05) :1475-+
[2]   Granger Causality and Transfer Entropy Are Equivalent for Gaussian Variables [J].
Barnett, Lionel ;
Barrett, Adam B. ;
Seth, Anil K. .
PHYSICAL REVIEW LETTERS, 2009, 103 (23)
[3]   What is the difference between missing completely at random and missing at random? [J].
Bhaskaran, Krishnan ;
Smeeth, Liam .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2014, 43 (04) :1336-1339
[4]  
Bilogur A., 2018, J. Open Source Softw., V3, P547, DOI [DOI 10.21105/JOSS.00547, 10.21105/joss.0054734]
[5]  
Castro R, 2015, LECT NOTES EINDHOVEN, V4
[6]  
Chakravarti R., 1967, Journal of the American Statistical Association, V63, P1047
[7]  
Chatfifield C., 2019, The analysis of time series: an introduction with R
[8]   Electronic Health Records vs Medicaid Claims: Completeness of Diabetes Preventive Care Data in Community Health Centers [J].
DeVoe, Jennifer E. ;
Gold, Rachel ;
McIntire, Patti ;
Puro, Jon ;
Chauvie, Susan ;
Gallia, Charles A. .
ANNALS OF FAMILY MEDICINE, 2011, 9 (04) :351-358
[9]  
Erhuanga G, 2020, THESIS RUTGERS U NEW
[10]   A federated EHR network data completeness tracking system [J].
Estiri, Hossein ;
Klann, Jeffrey G. ;
Weiler, Sarah R. ;
Alema-Mensah, Ernest ;
Applegate, R. Joseph ;
Lozinski, Galina ;
Patibandla, Nandan ;
Wei, Kun ;
Adams, William G. ;
Natter, Marc D. ;
Ofili, Elizabeth O. ;
Ostasiewski, Brian ;
Quarshie, Alexander ;
Rosenthal, Gary E. ;
Bernstam, Elmer V. ;
Mandl, Kenneth D. ;
Murphy, Shawn N. .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (07) :637-645