Predicting the Risk of Diabetes in Big Data Electronic Health Records by using Scalable Random Forest Classification Algorithm

被引:0
作者
Rallapalli, Sreekanth [1 ]
Suryakanthi, T. [2 ]
机构
[1] Botho Univ, Fac Comp, Gaborone, Botswana
[2] Univ Botswana, Fac Business, Gaborone, Botswana
来源
2016 THIRD INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND ENGINEERING (ICACCE 2016) | 2016年
关键词
Algorithm; Big Data; Classification; Cloud; EHR; Predictive model; Random Forest; REGRESSION;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Electronic Health Records (EHR) is growing at an exponential rate that is being stored in enterprise databases or cloud storages. These records have now grown to be called as Big Data. Most of these data are unstructured. The data can be efficiently processed on cloud for lowering the processing costs. Predictive analytics help the physicians, doctors to identify the patient admission to hospital at early stage. To perform predictive analytics various factors with demographic data, hospital parameters, patient past history and various indicators for a specific disease. But identifying the strong indicators for accurate prediction is a challenging task. From the factors being considered for predictive analytics various models and algorithms need to be studied. Classification algorithms like Naive Bayes, Linear Regression; generalized additive model, Random Forest, Logistic Regression, Hidden Markov Models has to be considered for developing a predictive models. In this paper we propose a predictive model using scalable Random forest classification algorithm which can accurately identify the classifier rate for risk of diabetes.
引用
收藏
页码:281 / 284
页数:4
相关论文
共 14 条
  • [1] [Anonymous], J BIOMEDICAL INFORM
  • [2] [Anonymous], 7 WAYS PREDICTIVE AN
  • [3] [Anonymous], JELENA PJESIVAC GRBO
  • [4] [Anonymous], 2012, MACHINE LEARNING PRO
  • [5] [Anonymous], J HLTH ORG MANAGEMEN
  • [6] Borthakur D, 2007, The hadoop distributed file system: Architecture and design
  • [7] BREIMAN L, 1985, J AM STAT ASSOC, V80, P580, DOI 10.2307/2288473
  • [8] Breiman L., 2001, Machine Learning, V45, P5
  • [9] Mapreduce: Simplified data processing on large clusters
    Dean, Jeffrey
    Ghemawat, Sanjay
    [J]. COMMUNICATIONS OF THE ACM, 2008, 51 (01) : 107 - 113
  • [10] Hastie T., 1986, STAT SCI, V1, P297