An efficient convolutional neural network for coronary heart disease prediction

被引:106
作者
Dutta, Aniruddha [1 ,2 ]
Batabyal, Tamal [3 ,4 ]
Basu, Meheli [5 ]
Acton, Scott T. [3 ,6 ]
机构
[1] Queens Univ, Dept Pathol & Mol Med, Kingston, ON K7L 3N6, Canada
[2] Univ Calif Berkeley, Haas Sch Business, Berkeley, CA 94720 USA
[3] Univ Virginia, Dept Elect & Comp Engn, Charlottesville, VA 22904 USA
[4] Univ Virginia, Sch Med, Dept Neurol, Charlottesville, VA 22904 USA
[5] Univ Pittsburgh, Katz Grad Sch Business, Pittsburgh, PA 15260 USA
[6] Univ Virginia, Dept Biomed Engn, Charlottesville, VA 22904 USA
关键词
Coronary heart disease; Machine learning; LASSO regression; Convolutional neural network; Artificial Intelligence; NHANES; RISK-FACTOR; CARDIOVASCULAR-DISEASE; SERUM CREATININE; DIETARY PATTERN; BLOOD-PRESSURE; PREVENTION; MORTALITY; DIAGNOSIS; MEN; ATHEROSCLEROSIS;
D O I
10.1016/j.eswa.2020.113408
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study proposes an efficient neural network with convolutional layers to classify significantly class-imbalanced clinical data. The data is curated from the National Health and Nutritional Examination Survey (NHANES) with the goal of predicting the occurrence of Coronary Heart Disease (CHD). While the majority of the existing machine learning models that have been used on this class of data are vulnerable to class imbalance even after the adjustment of class-specific weights, our simple two-layer CNN exhibits resilience to the imbalance with fair harmony in class-specific performance. Given a highly im-balanced dataset, it is often challenging to simultaneously achieve a high class 1 (true CHD prediction rate) accuracy along with a high class 0 accuracy, as the test data size increases. We adopt a two-step approach: first, we employ least absolute shrinkage and selection operator (LASSO) based feature weight assessment followed by majority-voting based identification of important features. Next, the important features are homogenized by using a fully connected layer, a crucial step before passing the output of the layer to successive convolutional stages. We also propose a training routine per epoch, akin to a simulated annealing process, to boost the classification accuracy. Despite a high class imbalance in the NHANES dataset, the investigation confirms that our proposed CNN architecture has the classification power of 77% to correctly classify the presence of CHD and 81.8% to accurately classify the absence of CHD cases on a testing data, which is 85.70% of the total dataset. This result signifies that the proposed architecture can be generalized to other studies in healthcare with a similar order of features and imbalances. While the recall values obtained from other machine learning methods, such as SVM and random forest, are comparable to that of our proposed CNN model, our model predicts the negative (Non-CHD) cases with higher accuracy. Our model architecture exhibits a way forward to develop better investigative tools, improved medical treatment and lower diagnostic costs by incorporating a smart diagnostic system in the healthcare system. The balanced accuracy of our model (79.5%) is also better than individual accuracies of SVM or random forest classifiers. The CNN classifier results in high specificity and test accuracy along with high values of recall and area under the curve (AUC). (C) 2020 Elsevier Ltd. All rights reserved.
引用
收藏
页数:16
相关论文
共 63 条
  • [51] Feature selection for medical diagnosis : Evaluation for cardiovascular diseases
    Shilaskar, Swati
    Ghatol, Ashok
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (10) : 4146 - 4153
  • [52] Dietary Fat and Coronary Heart Disease: Summary of Evidence from Prospective Cohort and Randomised Controlled Trials
    Skeaff, C. Murray
    Miller, Jody
    [J]. ANNALS OF NUTRITION AND METABOLISM, 2009, 55 (1-3) : 173 - U287
  • [53] An instance level analysis of data complexity
    Smith, Michael R.
    Martinez, Tony
    Giraud-Carrier, Christophe
    [J]. MACHINE LEARNING, 2014, 95 (02) : 225 - 256
  • [54] DIABETES, OTHER RISK-FACTORS, AND 12-YR CARDIOVASCULAR MORTALITY FOR MEN SCREENED IN THE MULTIPLE RISK FACTOR INTERVENTION TRIAL
    STAMLER, J
    VACCARO, O
    NEATON, JD
    WENTWORTH, D
    [J]. DIABETES CARE, 1993, 16 (02) : 434 - 444
  • [55] Szegedy C, 2017, AAAI CONF ARTIF INTE, P4278
  • [56] A Hybrid Intelligent System Framework for the Prediction of Heart Disease Using Machine Learning Algorithms
    Ul Haq, Amin
    Li, Jian Ping
    Memon, Muhammad Hammad
    Nazir, Shah
    Sun, Ruinan
    [J]. MOBILE INFORMATION SYSTEMS, 2018, 2018
  • [57] Diagnosis of heart disease using genetic algorithm based trained recurrent fuzzy neural networks
    Uyar, Kaan
    Ilhan, Ahmet
    [J]. 9TH INTERNATIONAL CONFERENCE ON THEORY AND APPLICATION OF SOFT COMPUTING, COMPUTING WITH WORDS AND PERCEPTION, ICSCCW 2017, 2017, 120 : 588 - 593
  • [58] Impact of high-normal blood pressure on the risk of cardiovascular disease.
    Vasan, RS
    Larson, MG
    Leip, EP
    Evans, JC
    O'Donnell, CJ
    Kannel, WB
    Levy, D
    [J]. NEW ENGLAND JOURNAL OF MEDICINE, 2001, 345 (18) : 1291 - 1297
  • [59] Serum creatinine concentration and risk of cardiovascular disease - A possible marker for increased risk of stroke
    Wannamethee, SG
    Shaper, AG
    Perry, IJ
    [J]. STROKE, 1997, 28 (03) : 557 - 563
  • [60] Can machine-learning improve cardiovascular risk prediction using routine clinical data?
    Weng, Stephen F.
    Reps, Jenna
    Kai, Joe
    Garibaldi, Jonathan M.
    Qureshi, Nadeem
    [J]. PLOS ONE, 2017, 12 (04):