An intelligent learning system based on electronic health records for unbiased stroke prediction

被引:0
作者
Saleem, Muhammad Asim [1 ]
Javeed, Ashir [2 ]
Akarathanawat, Wasan [3 ,4 ,5 ]
Chutinet, Aurauma [3 ,4 ,5 ]
Suwanwela, Nijasri Charnnarong [3 ,4 ,5 ]
Kaewplung, Pasu [1 ]
Chaitusaney, Surachai [1 ]
Deelertpaiboon, Sunchai [1 ]
Srisiri, Wattanasak [1 ]
Benjapolakul, Watit [1 ]
机构
[1] Chulalongkorn Univ, Fac Engn, Ctr Excellence Artificial Intelligence Machine Lea, Dept Elect Engn, Bangkok 10330, Thailand
[2] Karolinska Inst, Aging Res Ctr, S-17165 Stockholm, Sweden
[3] Chulalongkorn Univ, Fac Med, Dept Med, Div Neurol, Bangkok 10330, Thailand
[4] King Chulalongkorn Mem Hosp, Chulalongkorn Stroke Ctr, Thai Red Cross Soc, Bangkok 10330, Thailand
[5] King Chulalongkorn Mem Hosp, Chula Neurosci Ctr, Thai Red Cross Soc, Bangkok 10330, Thailand
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Stroke; Feature extraction; Machine learning; Imbalance classes; ROC CURVE;
D O I
10.1038/s41598-024-73570-x
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Stroke has a negative impact on people's lives and is one of the leading causes of death and disability worldwide. Early detection of symptoms can significantly help predict stroke and promote a healthy lifestyle. Researchers have developed several methods to predict strokes using machine learning (ML) techniques. However, the proposed systems have suffered from the following two main problems. The first problem is that the machine learning models are biased due to the uneven distribution of classes in the dataset. Recent research has not adequately addressed this problem, and no preventive measures have been taken. Synthetic Minority Oversampling (SMOTE) has been used to remove bias and balance the training of the proposed ML model. The second problem is to solve the problem of lower classification accuracy of machine learning models. We proposed a learning system that combines an autoencoder with a linear discriminant analysis (LDA) model to increase the accuracy of the proposed ML model for stroke prediction. Relevant features are extracted from the feature space using the autoencoder, and the extracted subset is then fed into the LDA model for stroke classification. The hyperparameters of the LDA model are found using a grid search strategy. However, the conventional accuracy metric does not truly reflect the performance of ML models. Therefore, we employed several evaluation metrics to validate the efficiency of the proposed model. Consequently, we evaluated the proposed model's accuracy, sensitivity, specificity, area under the curve (AUC), and receiver operator characteristic (ROC). The experimental results show that the proposed model achieves a sensitivity and specificity of 98.51% and 97.56%, respectively, with an accuracy of 99.24% and a balanced accuracy of 98.00%.
引用
收藏
页数:14
相关论文
共 48 条
  • [1] Hypertension and diabetes mellitus as a predictive risk factors for stroke
    Alloubani, Aladeen
    Saleh, Abdulmoneam
    Abdelhafiz, Ibrahim
    [J]. DIABETES & METABOLIC SYNDROME-CLINICAL RESEARCH & REVIEWS, 2018, 12 (04) : 577 - 584
  • [2] Annas S., 2021, J. Phys. Conf. Ser, V2123, DOI [10.1088/1742-6596/2123/1/012016, DOI 10.1088/1742-6596/2123/1/012016]
  • [3] Bandi Vamsi, 2020, Revue d'Intelligence Artificielle, V34, P753, DOI 10.18280/ria.340609
  • [4] Belete D M., 2021, International Journal of Computers and Applications, P1
  • [5] Prediction of stroke thrombolysis outcome using CT brain machine learning
    Bentley, Paul
    Ganesalingam, Jeban
    Jones, Anoma Lalani Carlton
    Mahady, Kate
    Epton, Sarah
    Rinne, Paul
    Sharma, Pankaj
    Halse, Omid
    Mehta, Amrish
    Rueckert, Daniel
    [J]. NEUROIMAGE-CLINICAL, 2014, 4 : 635 - 640
  • [6] Prediction of Stroke Using Deep Learning Model
    Chantamit-o-pas, Pattanapong
    Goyal, Madhu
    [J]. NEURAL INFORMATION PROCESSING, ICONIP 2017, PT V, 2017, 10638 : 774 - 781
  • [7] The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation
    Chicco, Davide
    Totsch, Niklas
    Jurman, Giuseppe
    [J]. BIODATA MINING, 2021, 14 (01) : 1 - 22
  • [8] Ding Yufeng., 2010, Journal of Machine Learning Research, V11
  • [9] Stroke Risk Prediction with Machine Learning Techniques
    Dritsas, Elias
    Trigka, Maria
    [J]. SENSORS, 2022, 22 (13)
  • [10] Erdog an Z., 2019, J. Ambient Intell. Humanized Comput, P1, DOI [DOI 10.1007/S12652-019-01432-W, 10.1007/s12652-019-01432-w]