A Machine-Learning-Based Prediction Method for Hypertension Outcomes Based on Medical Data

被引:110
作者
Chang, Wenbing [1 ]
Liu, Yinglai [1 ]
Xiao, Yiyong [1 ]
Yuan, Xinglong [1 ]
Xu, Xingxing [1 ]
Zhang, Siyue [1 ]
Zhou, Shenghan [1 ]
机构
[1] Beihang Univ, Sch Reliabil & Syst Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
hypertension outcomes; feature selection; recursive feature elimination; classification algorithm; XGBoost; prediction; GENE SELECTION; RANDOM FOREST; GLOBAL BURDEN; SVM-RFE; CLASSIFICATION; COMPLICATIONS; NETWORKS;
D O I
10.3390/diagnostics9040178
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
The outcomes of hypertension refer to the death or serious complications (such as myocardial infarction or stroke) that may occur in patients with hypertension. The outcomes of hypertension are very concerning for patients and doctors, and are ideally avoided. However, there is no satisfactory method for predicting the outcomes of hypertension. Therefore, this paper proposes a prediction method for outcomes based on physical examination indicators of hypertension patients. In this work, we divide the patients' outcome prediction into two steps. The first step is to extract the key features from the patients' many physical examination indicators. The second step is to use the key features extracted from the first step to predict the patients' outcomes. To this end, we propose a model combining recursive feature elimination with a cross-validation method and classification algorithm. In the first step, we use the recursive feature elimination algorithm to rank the importance of all features, and then extract the optimal features subset using cross-validation. In the second step, we use four classification algorithms (support vector machine (SVM), C4.5 decision tree, random forest (RF), and extreme gradient boosting (XGBoost)) to accurately predict patient outcomes by using their optimal features subset. The selected model prediction performance evaluation metrics are accuracy, F1 measure, and area under receiver operating characteristic curve. The 10-fold cross-validation shows that C4.5, RF, and XGBoost can achieve very good prediction results with a small number of features, and the classifier after recursive feature elimination with cross-validation feature selection has better prediction performance. Among the four classifiers, XGBoost has the best prediction performance, and its accuracy, F1, and area under receiver operating characteristic curve (AUC) values are 94.36%, 0.875, and 0.927, respectively, using the optimal features subset. This article's prediction of hypertension outcomes contributes to the in-depth study of hypertension complications and has strong practical significance.
引用
收藏
页数:21
相关论文
共 43 条
[1]   Using methods from the data-mining and machine-learning literature for disease classification and prediction: a case study examining classification of heart failure subtypes [J].
Austin, Peter C. ;
Tu, Jack V. ;
Ho, Jennifer E. ;
Levy, Daniel ;
Lee, Douglas S. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2013, 66 (04) :398-407
[2]   An empirical comparison of voting classification algorithms: Bagging, boosting, and variants [J].
Bauer, E ;
Kohavi, R .
MACHINE LEARNING, 1999, 36 (1-2) :105-139
[3]   A machine learning approach for automated recognition of movement patterns using basic, kinetic and kinematic gait data [J].
Begg, R ;
Kamruzzaman, J .
JOURNAL OF BIOMECHANICS, 2005, 38 (03) :401-408
[4]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[5]  
Cortes C., 1995, MACH LEARN, V1995, P273, DOI DOI 10.1007/BF00994018
[6]   Training invariant support vector machines [J].
Decoste, D ;
Schölkopf, B .
MACHINE LEARNING, 2002, 46 (1-3) :161-190
[7]  
Devadason P., 2014, INT J INTERDISCIP MU, V1, P160
[8]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Improving the performance of SVM-RFE to select genes in microarray data [J].
Ding, Yuanyuan ;
Wilkins, Dawn .
BMC BIOINFORMATICS, 2006, 7 (Suppl 2)
[10]  
Duan KB, 2007, LECT NOTES COMPUT SC, V4447, P47