Application of multi-label classification models for the diagnosis of diabetic complications

被引:18
作者
Zhou, Liang [1 ]
Zheng, Xiaoyuan [1 ]
Yang, Di [2 ]
Wang, Ying [1 ]
Bai, Xuesong [3 ]
Ye, Xinhua [1 ]
机构
[1] Nanjing Med Univ, Changzhou Peoples Hosp 2, Dept Endocrinol, 29 Xinglongxiang Rd, Changzhou 213000, Jiangsu, Peoples R China
[2] Shanghai Jiao Tong Univ, Sch Med, Shanghai 200025, Peoples R China
[3] Capital Med Univ, Beijing 100053, Peoples R China
关键词
Diabetic complication; Machine learning; Multi-label classification; Correlation; Key indicators; INDIVIDUAL PARTICIPANT DATA; RISK EQUATIONS; TYPE-2; VALIDATION; MELLITUS; DISEASE;
D O I
10.1186/s12911-021-01525-7
中图分类号
R-058 [];
学科分类号
摘要
Background Early diagnosis for the diabetes complications is clinically demanding with great significancy. Regarding the complexity of diabetes complications, we applied a multi-label classification (MLC) model to predict four diabetic complications simultaneously using data in the modern electronic health records (EHRs), and leveraged the correlations between the complications to further improve the prediction accuracy. Methods We obtained the demographic characteristics and laboratory data from the EHRs for patients admitted to Changzhou No. 2 People's Hospital, the affiliated hospital of Nanjing Medical University in China from May 2013 to June 2020. The data included 93 biochemical indicators and 9,765 patients. We used the Pearson correlation coefficient (PCC) to analyze the correlations between different diabetic complications from a statistical perspective. We used an MLC model, based on the Random Forest (RF) technique, to leverage these correlations and predict four complications simultaneously. We explored four different MLC models; a Label Power Set (LP), Classifier Chains (CC), Ensemble Classifier Chains (ECC), and Calibrated Label Ranking (CLR). We used traditional Binary Relevance (BR) as a comparison. We used 11 different performance metrics and the area under the receiver operating characteristic curve (AUROC) to evaluate these models. We analyzed the weights of the learned model and illustrated (1) the top 10 key indicators of different complications and (2) the correlations between different diabetic complications. Results The MLC models including CC, ECC and CLR outperformed the traditional BR method in most performance metrics; the ECC models performed the best in Hamming loss (0.1760), Accuracy (0.7020), F1_Score (0.7855), Precision (0.8649), F1_micro (0.8078), F1_macro (0.7773), Recall_micro (0.8631), Recall_macro (0.8009), and AUROC (0.8231). The two diabetic complication correlation matrices drawn from the PCC analysis and the MLC models were consistent with each other and indicated that the complications correlated to different extents. The top 10 key indicators given by the model are valuable in medical application. Conclusions Our MLC model can effectively utilize the potential correlation between different diabetic complications to further improve the prediction accuracy. This model should be explored further in other complex diseases with multiple complications.
引用
收藏
页数:10
相关论文
共 49 条
[1]   Cardiovascular and All-Cause Mortality Over a 23-Year Period Among Chinese With Newly Diagnosed Diabetes in the Da Qing IGT and Diabetes Study [J].
An, Yali ;
Zhang, Ping ;
Wang, Jinping ;
Gong, Qiuhong ;
Gregg, Edward W. ;
Yang, Wenying ;
Li, Hui ;
Zhang, Bo ;
Shuai, Ying ;
Chen, Yanyan ;
Engelgau, Michael M. ;
Cheng, Yiling ;
Hu, Yinghua ;
Bennett, Peter H. ;
Li, Guangwei .
DIABETES CARE, 2015, 38 (07) :1365-1371
[2]  
Bai BM, 2020, ICICCT 2019 SYSTEM R
[3]   Validation of Risk Equations for Complications of Type 2 Diabetes (RECODe) Using Individual Participant Data From Diverse Longitudinal Cohorts in the U.S [J].
Basu, Sanjay ;
Sussman, Jeremy B. ;
Berkowitz, Seth A. ;
Hayward, Rodney A. ;
Bertoni, Alain G. ;
Correa, Adolfo ;
Mwasongwe, Stanford ;
Yudkin, John S. .
DIABETES CARE, 2018, 41 (03) :586-595
[4]   Development and validation of Risk Equations for Complications Of type 2 Diabetes (RECODe) using individual participant data from randomised trials [J].
Basu, Sanjay ;
Sussman, Jeremy B. ;
Berkowitz, Seth A. ;
Hayward, Rodney A. ;
Yudkin, John S. .
LANCET DIABETES & ENDOCRINOLOGY, 2017, 5 (10) :788-798
[5]   Machine-learning-based early prediction of end-stage renal disease in patients with diabetic kidney disease using clinical trials data [J].
Belur Nagaraj, Sunil ;
Pena, Michelle J. ;
Ju, Wenjun ;
Heerspink, Hiddo L. .
DIABETES OBESITY & METABOLISM, 2020, 22 (12) :2479-2486
[6]   Learning multi-label scene classification [J].
Boutell, MR ;
Luo, JB ;
Shen, XP ;
Brown, CM .
PATTERN RECOGNITION, 2004, 37 (09) :1757-1771
[7]  
Dagliati Arianna, 2018, J Diabetes Sci Technol, V12, P295, DOI 10.1177/1932296817706375
[8]  
El Kafrawy P., 2015, International Journal of Computer Applications, V114, P1
[9]  
Elisseeff A, 2002, ADV NEUR IN, V14, P681
[10]   The eye and the heart [J].
Flammer, Josef ;
Konieczka, Katarzyna ;
Bruno, Rosa M. ;
Virdis, Agostino ;
Flammer, Andreas J. ;
Taddei, Stefano .
EUROPEAN HEART JOURNAL, 2013, 34 (17) :1270-+