Early prediction of diabetes by applying data mining techniques: A retrospective cohort study

被引:1
作者
Al Yousef, Mohammed Zeyad [1 ]
Yasky, Adel Fouad [1 ]
Al Shammari, Riyad [2 ,3 ]
Ferwana, Mazen S. [4 ]
机构
[1] King Abdullah Int Med Res Ctr, Family Med, King Abdulaziz Med City, Riyadh 14812, Saudi Arabia
[2] King Saud bin Abdul Aziz Univ Hlth Sci, Dept Hlth Informat, Coll Publ Hlth & Hlth Informat, Riyadh, Saudi Arabia
[3] Ctr Excellence Hlth Informat, Riyadh, Saudi Arabia
[4] King Abdul Aziz Med City, Family Med & Primary Healthcare Dept, Riyadh, Saudi Arabia
关键词
data mining; diabetes; diabetes prevention; RISK SCORE; SAUDI-ARABIA; PERFORMANCE; VALIDATION; MELLITUS; KINGDOM;
D O I
10.1097/MD.0000000000029588
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background: Saudi Arabia ranks 7th globally in terms of diabetes prevalence, and its prevalence is expected to reach 45.36% by 2030. The cost of diabetes is expected to increase to 27 billion Saudi riyals in cases where undiagnosed individuals are also documented. Prevention and early detection can effectively address these challenges. Objective: To improve healthcare services and assist in building predictive models to estimate the probability of diabetes in patients. Methods: A chart review, which was a retrospective cohort study, was conducted at the National Guard Health Affairs in Riyadh, Saudi Arabia. Data were collected from 5 hospitals using National Guard Health Affairs databases. We used 38 attributes of 21431 patients between 2015 and 2019. The following phases were performed: (1) data collection, (2) data preparation, (3) data mining and model building, and (4) model evaluation and validation. Subsequently, 6 algorithms were compared with and without the synthetic minority oversampling technique. Results: The highest performance was found in the Bayesian network, which had an area under the curve of 0.75 and 0.71. Conclusion: Although the results were acceptable, they could be improved. In this context, missing data owing to technical issues played a major role in affecting the performance of our model. Nevertheless, the model could be used in prevention, health monitoring programs, and as an automated mass population screening tool without the need for extra costs compared to traditional methods.
引用
收藏
页数:8
相关论文
共 44 条
[1]   Predicting hypertension using machine learning: Findings from Qatar Biobank Study [J].
AlKaabi, Latifa A. ;
Ahmed, Lina S. ;
Al Attiyah, Maryam F. ;
Abdel-Rahman, Manar E. .
PLOS ONE, 2020, 15 (10)
[3]  
Balakrishnan N., 1991, HDB LOGISTIC DISTRIB
[4]   Development and Validation of a Patient Self-assessment Score for Diabetes Risk [J].
Bang, Heejung ;
Edwards, Alison M. ;
Bomback, Andrew S. ;
Ballantyne, Christie M. ;
Brillon, David ;
Callahan, Mark A. ;
Teutsch, Steven M. ;
Mushlin, Alvin I. ;
Kern, Lisa M. .
ANNALS OF INTERNAL MEDICINE, 2009, 151 (11) :775-W255
[5]  
Chawla NV, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P875, DOI 10.1007/978-0-387-09823-4_45
[6]  
Daghistani T., 2016, International Journal of Advanced Computer Science and Applications, V7
[7]   A genetic fuzzy system for unstable angina risk assessment [J].
Dong, Wei ;
Huang, Zhengxing ;
Ji, Lei ;
Duan, Huilong .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2014, 14
[8]  
Eckerson, 2020, RISE DATA EXCHANGES
[9]   Status of the diabetes epidemic in the Kingdom of Saudi Arabia, 2013 [J].
El Bcheraoui, Charbel ;
Basulaiman, Mohammed ;
Tuffaha, Marwa ;
Daoud, Farah ;
Robinson, Margaret ;
Jaber, Sara ;
Mikhitarian, Sarah ;
Memish, Ziad A. ;
Al Saeedi, Mohammad ;
AlMazroa, Mohammad A. ;
Mokdad, Ali H. .
INTERNATIONAL JOURNAL OF PUBLIC HEALTH, 2014, 59 (06) :1011-1021
[10]  
Finlay Steven., 2014, Predictive Analytics, Data Mining, and Big Data. Miths, V1st