Machine learning based predictive model of Type 2 diabetes complications using Malaysian National Diabetes Registry: A study protocol

被引:3
作者
Abas, Mohamad Zulfikrie [1 ,4 ]
Li, Ken [2 ]
Hairi, Noran Naqiah [1 ]
Choo, Wan Yuen [1 ,5 ]
Wan, Kim Sui [3 ]
机构
[1] Univ Malaya, Kuala Lumpur, Malaysia
[2] Univ London, Goldsmiths Coll, London, England
[3] Minist Hlth Malaysia, Inst Publ Hlth, Shah Alam, Selangor, Malaysia
[4] Univ Malaya, Kuala Lumpur 50603, Malaysia
[5] Univ Malaya, Fac Med, Social & Prevent Med Dept, Kuala Lumpur 50603, Malaysia
关键词
Type; 2; diabetes; machine learning; predictive models; diabetes complications; diabetes registry;
D O I
10.1177/22799036241231786
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: The prevalence of diabetes in Malaysia is increasing, and identifying patients with higher risk of complications is crucial for effective management. The use of machine learning (ML) to develop prediction models has been shown to outperform non-ML models. This study aims to develop predictive models for Type 2 Diabetes (T2D) complications in Malaysia using ML techniques.Design and methods: This 10-year retrospective cohort study uses clinical audit datasets from Malaysian National Diabetes Registry from 2011 to 2021. T2D patients who received treatment in public health clinics in the southern region of Malaysia with at least two data points in 10 years are included. Patients with diabetes complications at baseline are excluded to ensure temporality between predictors and the target variable. Appropriate methods are used to address issues related to data cleaning, missing data imputation, data splitting, feature selection, and class imbalance. The study uses 7 ML algorithms, including logistic regression, support vector machine, k-nearest neighbours, decision tree, random forest, extreme gradient boosting, and light gradient boosting machine, to develop predictive models for four target variables: nephropathy, retinopathy, ischaemic heart disease, and stroke. Hyperparameter tuning is performed for each algorithm. The model training is performed using a stratified k-fold cross-validation technique. The best model for each algorithm is evaluated on a hold-out dataset using multiple metrics.Expected impact of the study on public health: The prediction model may be a valuable tool for diabetes management and secondary prevention by enabling earlier interventions and optimal resource allocation, leading to better health outcomes.
引用
收藏
页数:8
相关论文
共 17 条
[1]  
Atlas D., 2015, IDF Diabetes Atlas, V33
[2]  
Cichosz Simon Lebech, 2015, J Diabetes Sci Technol, V10, P27, DOI 10.1177/1932296815611680
[3]  
Geron A, 2022, Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow
[4]   Statistical Primer: developing and validating a risk prediction model [J].
Grant, Stuart W. ;
Collins, Gary S. ;
Nashef, Samer A. M. .
EUROPEAN JOURNAL OF CARDIO-THORACIC SURGERY, 2018, 54 (02) :203-208
[5]  
Institute for Public Health, 2020, National Health and Morbidity Survey (NHMS) 2019: Non-communicable Diseases, Healthcare Demand, and Health literacy-Key Findings
[6]   Machine learning and deep learning [J].
Janiesch, Christian ;
Zschech, Patrick ;
Heinrich, Kai .
ELECTRONIC MARKETS, 2021, 31 (03) :685-695
[7]   A Machine Learning Approach to Predicting Diabetes Complications [J].
Jian, Yazan ;
Pasquier, Michel ;
Sagahyroon, Assim ;
Aloul, Fadi .
HEALTHCARE, 2021, 9 (12)
[8]  
Ke GL, 2017, ADV NEUR IN, V30
[9]  
Khairudin Z., 2020, MALAYS J COMPUT, V5, P572, DOI [10.24191/mjoc.v5i2.10554, DOI 10.24191/MJOC.V5I2.10554]
[10]  
Kuhn M., 2013, Applied Predictive Modeling, DOI [DOI 10.1007/978-1-4614-6849-3_3, DOI 10.1007/978-1-4614-6849-3/COVER]