共 50 条
Characterisation of cardiovascular disease (CVD) incidence and machine learning risk prediction in middle-aged and elderly populations: data from the China health and retirement longitudinal study (CHARLS)
被引:0
|作者:
Huang, Qing
[1
]
Jiang, Zihao
[1
]
Shi, Bo
[2
]
Meng, Jiaxu
[2
]
Shu, Li
[1
]
Hu, Fuyong
[1
]
Mi, Jing
[1
]
机构:
[1] Bengbu Med Univ, Sch Publ Hlth, 2600 Donghai Ave, Bengbu 233030, Anhui, Peoples R China
[2] Bengbu Med Univ, Sch Med Imaging, 2600 Donghai Ave, Bengbu 233030, Anhui, Peoples R China
关键词:
Cardiovascular disease;
Middle-aged and elderly individuals;
Morbidity characteristics;
Machine learning;
Predictive modelling;
MORTALITY;
UPDATE;
D O I:
10.1186/s12889-025-21609-7
中图分类号:
R1 [预防医学、卫生学];
学科分类号:
1004 ;
120402 ;
摘要:
BackgroundDue to the ageing population and evolving lifestyles occurring in China, middle-aged and elderly populations have become high-risk groups for cardiovascular disease (CVD). The aim of this study was to analyse the incidence characteristics of CVD in these populations and develop a prediction model by using data from the China Health and Retirement Longitudinal Study (CHARLS).MethodsWe used follow-up data from the CHARLS to analyse CVD incidence in the Chinese middle-aged and elderly population over a time span of 9 years. Five machine learning (ML) algorithms were employed for risk prediction. Data preprocessing included missing value imputation via random forest. Feature selection was performed using the Least Absolute Shrinkage and Selection Operator (Lasso CV) method with cross-validation prior to model training. The application of the synthetic minority over-sampling technique (SMOTE) to address class imbalance. Model performance was evaluated via analyses including the area under the ROC curve (AUC), precision, recall, F1 score, and SHAP plots for interpretability.ResultsIn accordance with the exclusion criteria, 12,580, 12,061, 11,545, and 11,619 participants were enrolled in four follow-up rounds. The cumulative incidence (CI) of CVD at 2, 4, 7, and 9 years was 2.846%, 8.971%, 17.869% and 20.518%,, respectively. Significant differences in CVD incidence were observed across gender, age, ethnicity, and region, with higher rates observed in females and in the northeast region. Ultimately, 8,080 participants and 24 features were analysed for CVD risk prediction. Five ML models were built based on these features. Although the LGB model achieves an AUC of 0.818, indicating strong overall performance, its F1 score and recall rate are relatively low, at 0.509 and 43.1%, respectively. Shapley additive explanations (SHAP) analyses revealed the importance of key features, such as night sleep duration, TG levels, and waist circumference, in predicting outcomes, and highlighted the nonlinear relationships between these features and CVD risk.ConclusionsGender, age, ethnicity, and region are significant factors influencing CVD incidence. Although the LGB model demonstrates good overall performance, its low F1 score and recall rate reveal limitations in identifying high-risk cardiovascular disease patients.
引用
收藏
页数:12
相关论文