Using machine learning models to improve stroke risk level classification methods of China national stroke screening

被引:31
作者
Li, Xuemeng [1 ]
Bian, Di [2 ]
Yu, Jinghui [1 ]
Li, Mei [3 ]
Zhao, Dongsheng [1 ]
机构
[1] Acad Mil Med Sci, Informat Ctr, Beijing, Peoples R China
[2] Xian Univ Sci & Technol, Sch Elect & Control Engn, Xian, Peoples R China
[3] China Stroke Data Ctr, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
National Stroke Screening; Machine learning models; Risk level classification;
D O I
10.1186/s12911-019-0998-2
中图分类号
R-058 [];
学科分类号
摘要
Background: With the character of high incidence, high prevalence and high mortality, stroke has brought a heavy burden to families and society in China. In 2009, the Ministry of Health of China launched the China national stroke screening and intervention program, which screens stroke and its risk factors and conducts high-risk population interventions for people aged above 40 years old all over China. In this program, stroke risk factors include hypertension, diabetes, dyslipidemia, smoking, lack of exercise, apparently overweight and family history of stroke. People with more than two risk factors or history of stroke or transient ischemic attack (TIA) are considered as high-risk. However, it is impossible for this criterion to classify stroke risk levels for people with unknown values in fields of risk factors. The missing of stroke risk levels results in reduced efficiency of stroke interventions and inaccuracies in statistical results at the national level. In this paper, we use 2017 national stroke screening data to develop stroke risk classification models based on machine learning algorithms to improve the classification efficiency. Method: Firstly, we construct training set and test sets and process the imbalance training set based on oversampling and undersampling method. Then, we develop logistic regression model, Naive Bayesian model, Bayesian network model, decision tree model, neural network model, random forest model, bagged decision tree model, voting model and boosting model with decision trees to classify stroke risk levels. Result: The recall of the boosting model with decision trees is the highest (99.94%), and the precision of the model based on the random forest is highest (97.33%). Using the random forest model (recall: 98.44%), the recall will be increased by about 2.8% compared with the method currently used, and several thousands more people with high risk of stroke can be identified each year. Conclusion: Models developed in this paper can improve the current screening method in the way that it can avoid the impact of unknown values, and avoid unnecessary rescreening and intervention expenditures. The national stroke screening program can choose classification models according to the practice need.
引用
收藏
页数:7
相关论文
共 27 条
[11]  
Hosmer DW, 2013, APPL LOGISTIC REGRES, DOI [10.1002/9781118548387, DOI 10.1002/9781118548387]
[12]  
Jabbar M.A., 2016, Innovations in Bio-Inspired Computing and Applications, V424, P187, DOI [10.1007/978-3-319-28031-8_16, DOI 10.1007/978-3-319-28031-8_16, 10.1007/978-3-, DOI 10.1007/978-3]
[13]  
Kaur G., 2014, Int. J. Comput. Appl, V98, P13, DOI DOI 10.5120/17314-7433
[14]   Identifying Medical Diagnoses and Treatable Diseases by Image-Based Deep Learning [J].
Kermany, Daniel S. ;
Goldbaum, Michael ;
Cai, Wenjia ;
Valentim, Carolina C. S. ;
Liang, Huiying ;
Baxter, Sally L. ;
McKeown, Alex ;
Yang, Ge ;
Wu, Xiaokang ;
Yan, Fangbing ;
Dong, Justin ;
Prasadha, Made K. ;
Pei, Jacqueline ;
Ting, Magdalena ;
Zhu, Jie ;
Li, Christina ;
Hewett, Sierra ;
Dong, Jason ;
Ziyar, Ian ;
Shi, Alexander ;
Zhang, Runze ;
Zheng, Lianghong ;
Hou, Rui ;
Shi, William ;
Fu, Xin ;
Duan, Yaou ;
Huu, Viet A. N. ;
Wen, Cindy ;
Zhang, Edward D. ;
Zhang, Charlotte L. ;
Li, Oulan ;
Wang, Xiaobo ;
Singer, Michael A. ;
Sun, Xiaodong ;
Xu, Jie ;
Tafreshi, Ali ;
Lewis, M. Anthony ;
Xia, Huimin ;
Zhang, Kang .
CELL, 2018, 172 (05) :1122-+
[15]   A novel bagging C4.5 algorithm based on wrapper feature selection for supporting wise clinical decision making [J].
Lee, Shin-Jye ;
Xu, Zhaozhao ;
Li, Tong ;
Yang, Yun .
JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 78 :144-155
[16]   Prevalence of metabolic syndrome among middle-aged and elderly adults in China: current status and temporal trends [J].
Li, Wenzhen ;
Song, Fujian ;
Wang, Xiaojun ;
Wang, Longde ;
Wang, Dongming ;
Yin, Xiaoxv ;
Cao, Shiyi ;
Gong, Yanhong ;
Yue, Wei ;
Yan, Feng ;
Zhang, Hong ;
Sheng, Zhenjie ;
Wang, Zhihong ;
Lu, Zuxun .
ANNALS OF MEDICINE, 2018, 50 (04) :345-353
[17]  
Li XM, 2017, IEEE INT C BIOINFORM, P1047, DOI 10.1109/BIBM.2017.8217801
[18]   Stroke and Stroke Care in China Huge Burden, Significant Workload, and a National Priority [J].
Liu, Liping ;
Wang, David ;
Wong, K. S. Lawrence ;
Wang, Yongjun .
STROKE, 2011, 42 (12) :3651-3654
[19]   Stroke in China: epidemiology, prevention, and management strategies [J].
Liu, Ming ;
Wu, Bo ;
Wang, Wen-Zhi ;
Lee, Li-Ming ;
Zhang, Shi-Hong ;
Kong, Ling-Zhi .
LANCET NEUROLOGY, 2007, 6 (05) :456-464
[20]  
Murphy K.P., 2006, Univ. British Columbia, V18, P1