Interpretable Data-Driven Approach Based on Feature Selection Methods and GAN-Based Models for Cardiovascular Risk Prediction in Diabetic Patients

被引:2
|
作者
Chushig-Muzo, David [1 ]
Calero-Diaz, Hugo [1 ]
Lara-Abelenda, Francisco J. [1 ]
Gomez-Martinez, Vanesa [1 ]
Granja, Conceicao [2 ]
Soguero-Ruiz, Cristina [1 ]
机构
[1] Rey Juan Carlos Univ, Dept Signal Theory & Commun Telemat & Comp, Fuenlabrada 28943, Madrid, Spain
[2] Univ Hosp North Norway, Norwegian Ctr Ehlth Res, N-9038 Tromso, Norway
来源
IEEE ACCESS | 2024年 / 12卷
关键词
Diabetes; Predictive models; Training; Radio frequency; Data models; Biological system modeling; Feature extraction; Cardiovascular system; Generative adversarial networks; Cardiovascular risk prediction; type; 1; diabetes; machine learning; interpretable methods; feature selection; generative adversarial networks; accumulated local effects; post-hoc interpretability; CTGAN; DISEASE; EVENTS; NETWORKS; 1ST;
D O I
10.1109/ACCESS.2024.3412789
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Noncommunicable diseases (NCDs) are the leading cause of morbidity and mortality worldwide. Cardiovascular diseases (CVDs) and diabetes are the most prevalent NCDs, causing 1.9 and 1.5 million deaths yearly. Individuals diagnosed with type 1 diabetes (T1D) are at high risk of developing CVDs. Machine learning (ML) models have provided outstanding results in different domains, including healthcare, allowing to obtain models with high predictive performance. The aim of this study was to develop an interpretable data-driven approach to predict the 10-year CVD risk for T1D older individuals, aiming to provide both reasonable predictive performance and the identification of risk factors associated with CVDs. Data from T1D individuals at the Steno Diabetes Center Copenhagen were used. Different ML-based models were considered, including KNN, decision tree, random forest, and multilayer perceptron (MLP). To enhance the predictive performance of ML models, the conditional tabular generative adversarial network (CTGAN) was used to create synthetic data and increase the size of the training data. Several filter and wrapper feature selection (FS) techniques were considered for identifying the most relevant features involved in CVD risk and enhancing the performance of the ML-based models used. To gain interpretability on predictive models, we used the post-hoc methods: SHAP and accumulated local effects. The experimental results showed a great performance of FS and ML-based models for predicting CVD risk. In particular, the MLP obtained the best results, with a mean absolute error of 0.0088 and mean relative absolute error of 0.0817. Regarding risk factors, age, Hba1c, and albuminuria were identified as crucial in CVD risk prediction, which is in line with recent clinical evidence. Our study contributes to identifying CVD risk and associated risk factors in a data-driven manner, helping to make early interventions and adequate treatments to prevent CVDs.
引用
收藏
页码:84292 / 84305
页数:14
相关论文
共 50 条
  • [1] Data-driven cardiovascular risk prediction and prognosis factor identification in diabetic patients
    Calero-Diaz, Hugo
    Chushig-Muzo, David
    Fabelo, Himar
    Mora-Jimenez, Inmaculada
    Granja, Conceicao
    Soguero-Ruiz, Cristina
    2022 IEEE-EMBS INTERNATIONAL CONFERENCE ON BIOMEDICAL AND HEALTH INFORMATICS (BHI) JOINTLY ORGANISED WITH THE IEEE-EMBS INTERNATIONAL CONFERENCE ON WEARABLE AND IMPLANTABLE BODY SENSOR NETWORKS (BSN'22), 2022,
  • [2] A GAN-Based Data Injection Attack Method on Data-Driven Strategies in Power Systems
    Liu, Zengji
    Wang, Qi
    Ye, Yujian
    Tang, Yi
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (04) : 3203 - 3213
  • [3] Data-Driven Diabetes Risk Factor Prediction Using Machine Learning Algorithms with Feature Selection Technique
    Kakoly, Israt Jahan
    Hoque, Md. Rakibul
    Hasan, Najmul
    SUSTAINABILITY, 2023, 15 (06)
  • [4] Performance evaluation of automated data-driven feature extraction and selection methods for practical and scalable building energy consumption prediction models
    Kim, Janghyun
    Frank, Stephen
    Buechler, Robert
    Mishra, Sakshi
    Petersen, Anya
    Zhang, Liang
    Eslinger, Hannah
    JOURNAL OF BUILDING ENGINEERING, 2025, 103
  • [5] Prediction-Based Power Consumption Monitoring of Industrial Equipment Using Interpretable Data-Driven Models
    Xiao, Hui
    Hu, Wenshan
    Zhou, Hong
    Liu, Guo-Ping
    IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2024, 21 (02) : 1312 - 1322
  • [6] Integrating sensor data and GAN-based models to optimize medical university distribution: a data-driven approach for sustainable regional growth in Saudi Arabia
    Addas, Abdullah
    Khan, Muhammad Nasir
    Tahir, Muhammad
    Naseer, Fawad
    Gulzar, Yonis
    Onn, Choo Wou
    FRONTIERS IN EDUCATION, 2025, 10
  • [7] A causality based feature selection approach for data-driven dynamic security assessment
    Bellizio, Federica
    Cremer, Jochen L.
    Sun, Mingyang
    Strbac, Goran
    ELECTRIC POWER SYSTEMS RESEARCH, 2021, 201 (201)
  • [8] Efficient Data-Driven Machine Learning Models for Cardiovascular Diseases Risk Prediction
    Dritsas, Elias
    Trigka, Maria
    SENSORS, 2023, 23 (03)
  • [9] Data-Driven Approach based on Feature Selection Technique for Early Diagnosis of Alzheimer's Disease
    Thapa, Surendrabikram
    Singh, Priyanka
    Jain, Deepak Kumar
    Bharill, Neha
    Gupta, Akshansh
    Prasad, Mukesh
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [10] Feature selection and risk prediction for diabetic patients with ketoacidosis based on MIMIC-IV
    Liu, Yang
    Mo, Wei
    Wang, He
    Shao, Zixin
    Zeng, Yanping
    Bi, Jianlu
    FRONTIERS IN ENDOCRINOLOGY, 2024, 15