An efficient convolutional neural network for coronary heart disease prediction

被引:106
作者
Dutta, Aniruddha [1 ,2 ]
Batabyal, Tamal [3 ,4 ]
Basu, Meheli [5 ]
Acton, Scott T. [3 ,6 ]
机构
[1] Queens Univ, Dept Pathol & Mol Med, Kingston, ON K7L 3N6, Canada
[2] Univ Calif Berkeley, Haas Sch Business, Berkeley, CA 94720 USA
[3] Univ Virginia, Dept Elect & Comp Engn, Charlottesville, VA 22904 USA
[4] Univ Virginia, Sch Med, Dept Neurol, Charlottesville, VA 22904 USA
[5] Univ Pittsburgh, Katz Grad Sch Business, Pittsburgh, PA 15260 USA
[6] Univ Virginia, Dept Biomed Engn, Charlottesville, VA 22904 USA
关键词
Coronary heart disease; Machine learning; LASSO regression; Convolutional neural network; Artificial Intelligence; NHANES; RISK-FACTOR; CARDIOVASCULAR-DISEASE; SERUM CREATININE; DIETARY PATTERN; BLOOD-PRESSURE; PREVENTION; MORTALITY; DIAGNOSIS; MEN; ATHEROSCLEROSIS;
D O I
10.1016/j.eswa.2020.113408
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study proposes an efficient neural network with convolutional layers to classify significantly class-imbalanced clinical data. The data is curated from the National Health and Nutritional Examination Survey (NHANES) with the goal of predicting the occurrence of Coronary Heart Disease (CHD). While the majority of the existing machine learning models that have been used on this class of data are vulnerable to class imbalance even after the adjustment of class-specific weights, our simple two-layer CNN exhibits resilience to the imbalance with fair harmony in class-specific performance. Given a highly im-balanced dataset, it is often challenging to simultaneously achieve a high class 1 (true CHD prediction rate) accuracy along with a high class 0 accuracy, as the test data size increases. We adopt a two-step approach: first, we employ least absolute shrinkage and selection operator (LASSO) based feature weight assessment followed by majority-voting based identification of important features. Next, the important features are homogenized by using a fully connected layer, a crucial step before passing the output of the layer to successive convolutional stages. We also propose a training routine per epoch, akin to a simulated annealing process, to boost the classification accuracy. Despite a high class imbalance in the NHANES dataset, the investigation confirms that our proposed CNN architecture has the classification power of 77% to correctly classify the presence of CHD and 81.8% to accurately classify the absence of CHD cases on a testing data, which is 85.70% of the total dataset. This result signifies that the proposed architecture can be generalized to other studies in healthcare with a similar order of features and imbalances. While the recall values obtained from other machine learning methods, such as SVM and random forest, are comparable to that of our proposed CNN model, our model predicts the negative (Non-CHD) cases with higher accuracy. Our model architecture exhibits a way forward to develop better investigative tools, improved medical treatment and lower diagnostic costs by incorporating a smart diagnostic system in the healthcare system. The balanced accuracy of our model (79.5%) is also better than individual accuracies of SVM or random forest classifiers. The CNN classifier results in high specificity and test accuracy along with high values of recall and area under the curve (AUC). (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 63 条
  • [1] Ahmed MH, 2017, DIABETES METAB SYND, V11, pS963, DOI 10.1016/j.dsx.2017.07.023
  • [2] Cardiovascular Event Prediction by Machine Learning The Multi-Ethnic Study of Atherosclerosis
    Ambale-Venkatesh, Bharath
    Yang, Xiaoying
    Wu, Colin O.
    Liu, Kiang
    Hundley, W. Gregory
    McClelland, Robyn
    Gomes, Antoinette S.
    Folsom, Aaron R.
    Shea, Steven
    Guallar, Eliseo
    Bluemke, David A.
    Lima, Joao A. C.
    [J]. CIRCULATION RESEARCH, 2017, 121 (09) : 1092 - +
  • [3] Benjamin EJ, 2019, CIRCULATION, V139, pE56, DOI [10.1161/CIR.0000000000000659, 10.1161/CIR.0000000000000746]
  • [4] Coronary heart disease prevention: Nutrients, foods, and dietary patterns
    Bhupathiraju, Shilpa N.
    Tucker, Katherine L.
    [J]. CLINICA CHIMICA ACTA, 2011, 412 (17-18) : 1493 - 1514
  • [5] Cardiovascular risk factors and hyalinization of renal arterioles at autopsy - The Honolulu Heart Program
    Burchfiel, CM
    Tracy, RE
    Chyou, PH
    Strong, JP
    [J]. ARTERIOSCLEROSIS THROMBOSIS AND VASCULAR BIOLOGY, 1997, 17 (04) : 760 - 768
  • [6] Coronary risk factors and plaque morphology in men with coronary disease who died suddenly
    Burke, AP
    Farb, A
    Malcom, GT
    Liang, YH
    Smialek, J
    Virmani, R
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 1997, 336 (18) : 1276 - 1282
  • [7] CIGARETTE-SMOKING IS ASSOCIATED WITH DOSE-RELATED AND POTENTIALLY REVERSIBLE IMPAIRMENT OF ENDOTHELIUM-DEPENDENT DILATION IN HEALTHY-YOUNG ADULTS
    CELERMAJER, DS
    SORENSEN, KE
    GEORGAKOPOULOS, D
    BULL, C
    THOMAS, O
    ROBINSON, J
    DEANFIELD, JE
    [J]. CIRCULATION, 1993, 88 (05) : 2149 - 2155
  • [8] Center for Nutrition Policy and Promotion, 2010, DIET GUID AM
  • [9] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [10] Seventh Report of the Joint National Committee on Prevention, Detection, Evaluation, and Treatment of High Blood Pressure
    Chobanian, AV
    Bakris, GL
    Black, HR
    Cushman, WC
    Green, LA
    Izzo, JL
    Jones, DW
    Materson, BJ
    Oparil, S
    Wright, JT
    Roccella, EJ
    [J]. HYPERTENSION, 2003, 42 (06) : 1206 - 1252