Development and validation of explainable machine-learning models for carotid atherosclerosis early screening

被引:5
作者
Yun, Ke [1 ,2 ]
He, Tao [3 ]
Zhen, Shi [4 ]
Quan, Meihui [1 ,2 ]
Yang, Xiaotao [1 ,2 ]
Man, Dongliang [1 ,2 ]
Zhang, Shuang [1 ,2 ]
Wang, Wei [5 ]
Han, Xiaoxu [1 ,2 ,6 ,7 ]
机构
[1] China Med Univ, Affiliated Hosp 1, Natl Clin Res Ctr Lab Med, Shenyang, Liaoning, Peoples R China
[2] China Med Univ, Affiliated Hosp 1, Dept Lab Med, Shenyang, Liaoning, Peoples R China
[3] Neusoft Corp, Neusoft Res Inst, Shenyang, Liaoning, Peoples R China
[4] Northeastern Univ, Dept Software Engn, Shenyang, Liaoning, Peoples R China
[5] China Med Univ, Affiliated Hosp 1, Dept Phys Examinat Ctr, Shenyang, Liaoning, Peoples R China
[6] Chinese Acad Med Sci, Lab Med Innovat Unit, Shenyang, Liaoning, Peoples R China
[7] China Med Univ, Affiliated Hosp 1, NHC Key Lab AIDS Immunol, Shenyang, Liaoning, Peoples R China
关键词
Machine learning; Carotid atherosclerosis; Explainable model; CHINESE ADULTS; RISK-FACTORS; PREVALENCE; ULTRASOUND; BURDEN; AGE; GENDER;
D O I
10.1186/s12967-023-04093-8
中图分类号
R-3 [医学研究方法]; R3 [基础医学];
学科分类号
1001 ;
摘要
BackgroundCarotid atherosclerosis (CAS), an important factor in the development of stroke, is a major public health concern. The aim of this study was to establish and validate machine learning (ML) models for early screening of CAS using routine health check-up indicators in northeast China.MethodsA total of 69,601 health check-up records from the health examination center of the First Hospital of China Medical University (Shenyang, China) were collected between 2018 and 2019. For the 2019 records, 80% were assigned to the training set and 20% to the testing set. The 2018 records were used as the external validation dataset. Ten ML algorithms, including decision tree (DT), K-nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), multiplayer perceptron (MLP), extreme gradient boosting machine (XGB), gradient boosting decision tree (GBDT), linear support vector machine (SVM-linear), and non-linear support vector machine (SVM-nonlinear), were used to construct CAS screening models. The area under the receiver operating characteristic curve (auROC) and precision-recall curve (auPR) were used as measures of model performance. The SHapley Additive exPlanations (SHAP) method was used to demonstrate the interpretability of the optimal model.ResultsA total of 6315 records of patients undergoing carotid ultrasonography were collected; of these, 1632, 407, and 1141 patients were diagnosed with CAS in the training, internal validation, and external validation datasets, respectively. The GBDT model achieved the highest performance metrics with auROC of 0.860 (95% CI 0.839-0.880) in the internal validation dataset and 0.851 (95% CI 0.837-0.863) in the external validation dataset. Individuals with diabetes or those over 65 years of age showed low negative predictive value. In the interpretability analysis, age was the most important factor influencing the performance of the GBDT model, followed by sex and non-high-density lipoprotein cholesterol.ConclusionsThe ML models developed could provide good performance for CAS identification using routine health check-up indicators and could hopefully be applied in scenarios without ethnic and geographic heterogeneity for CAS prevention.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Evaluating Explainable Machine Learning Models for Clinicians
    Scarpato, Noemi
    Nourbakhsh, Aria
    Ferroni, Patrizia
    Riondino, Silvia
    Roselli, Mario
    Fallucchi, Francesca
    Barbanti, Piero
    Guadagni, Fiorella
    Zanzotto, Fabio Massimo
    COGNITIVE COMPUTATION, 2024, 16 (04) : 1436 - 1446
  • [32] Development and external validation of a machine learning model for cardiac valve calcification early screening in dialysis patients: a multicenter study
    Wang, Xiaoxu
    Li, Yinfang
    Cao, Zixin
    Li, Yunuo
    Cao, Jingyuan
    Wang, Yao
    Li, Min
    Zheng, Jing
    Peng, Siqi
    Shi, Wen
    Wu, Qianqian
    Yang, Junlan
    Fang, Yaping
    Zhang, Aiqing
    Zhang, Xiaoliang
    Wang, Bin
    RENAL FAILURE, 2025, 47 (01)
  • [33] How can machine-learning methods assist in virtual screening for hyperuricemia? A healthcare machine-learning approach
    Ichikawa, Daisuke
    Saito, Toki
    Ujita, Waka
    Oyama, Hiroshi
    JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 64 : 20 - 24
  • [34] Development and validation of three machine-learning models for predicting multiple organ failure in moderately severe and severe acute pancreatitis
    Qiu, Qiu
    Nian, Yong-jian
    Guo, Yan
    Tang, Liang
    Lu, Nan
    Wen, Liang-zhi
    Wang, Bin
    Chen, Dong-feng
    Liu, Kai-jun
    BMC GASTROENTEROLOGY, 2019, 19 (1)
  • [35] Development and validation of three machine-learning models for predicting multiple organ failure in moderately severe and severe acute pancreatitis
    Qiu Qiu
    Yong-jian Nian
    Yan Guo
    Liang Tang
    Nan Lu
    Liang-zhi Wen
    Bin Wang
    Dong-feng Chen
    Kai-jun Liu
    BMC Gastroenterology, 19
  • [36] Exploring Primary and Interaction Effects of Minor Physical Anomalies: Development and Validation of Prediction Models Using Explainable Machine Learning Algorithms for Early-Onset Schizophrenia
    Lin, Chih-Wei
    Lin, Jin-Jia
    Tseng, Huai-Hsuan
    Jang, Fong-Lin
    Lu, Ming-Kun
    Chen, Po-See
    Huang, Chih-Chun
    Yao, Chi-Yu
    Wang, Tzu-Yun
    Chang, Wei-Hung
    Tan, Hung-Pin
    Lin, Sheng-Hsiang
    SCHIZOPHRENIA BULLETIN, 2025,
  • [37] Development and validation of an ensemble machine-learning model for predicting early mortality among patients with bone metastases of hepatocellular carcinoma
    Long, Ze
    Yi, Min
    Qin, Yong
    Ye, Qianwen
    Che, Xiaotong
    Wang, Shengjie
    Lei, Mingxing
    FRONTIERS IN ONCOLOGY, 2023, 13
  • [38] Advancing interpretability of machine-learning prediction models
    Trenary, Laurie
    DelSole, Timothy
    ENVIRONMENTAL DATA SCIENCE, 2022, 1
  • [39] Machine-learning models for combinatorial catalyst discovery
    Landrum, GA
    Penzotti, JE
    Putta, S
    MEASUREMENT SCIENCE AND TECHNOLOGY, 2005, 16 (01) : 270 - 277
  • [40] SAnDReS 2.0: Development of machine-learning models to explore the scoring function space
    de Azevedo Jr, Walter Filgueira
    Quiroga, Rodrigo
    Villarreal, Marcos Ariel
    da Silveira, Nelson Jose Freitas
    Bitencourt-Ferreira, Gabriela
    da Silva, Amauri Duarte
    Veit-Acosta, Martina
    Oliveira, Patricia Rufino
    Tutone, Marco
    Biziukova, Nadezhda
    Poroikov, Vladimir
    Tarasova, Olga
    Baud, Stephaine
    JOURNAL OF COMPUTATIONAL CHEMISTRY, 2024, 45 (27) : 2333 - 2346