Discovering biomarkers associated and predicting cardiovascular disease with high accuracy using a novel nexus of machine learning techniques for precision medicine

被引:61
作者
Degroat, William [1 ]
Abdelhalim, Habiba [1 ]
Patel, Kush [1 ]
Mendhe, Dinesh [1 ]
Zeeshan, Saman [2 ]
Ahmed, Zeeshan [1 ,3 ]
机构
[1] Rutgers State Univ, Rutgers Inst Hlth, Hlth Care Policy & Aging Res, 112 Paterson St, New Brunswick, NJ 08901 USA
[2] Rutgers State Univ, Rutgers Canc Inst New Jersey, 195 Little Albany St, New Brunswick, NJ USA
[3] Univ Med & Dent New Jersey, Rutgers Biomed & Hlth Sci, Dept Med Cardiovasc Dis & Hypertens, 125 Paterson St, New Brunswick, NJ 08901 USA
关键词
GENOME-WIDE ASSOCIATION; PATHOGENESIS; MANAGEMENT; GENETICS; COMPLEX; RISK;
D O I
10.1038/s41598-023-50600-8
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Personalized interventions are deemed vital given the intricate characteristics, advancement, inherent genetic composition, and diversity of cardiovascular diseases (CVDs). The appropriate utilization of artificial intelligence (AI) and machine learning (ML) methodologies can yield novel understandings of CVDs, enabling improved personalized treatments through predictive analysis and deep phenotyping. In this study, we proposed and employed a novel approach combining traditional statistics and a nexus of cutting-edge AI/ML techniques to identify significant biomarkers for our predictive engine by analyzing the complete transcriptome of CVD patients. After robust gene expression data pre-processing, we utilized three statistical tests (Pearson correlation, Chi-square test, and ANOVA) to assess the differences in transcriptomic expression and clinical characteristics between healthy individuals and CVD patients. Next, the recursive feature elimination classifier assigned rankings to transcriptomic features based on their relation to the case-control variable. The top ten percent of commonly observed significant biomarkers were evaluated using four unique ML classifiers (Random Forest, Support Vector Machine, Xtreme Gradient Boosting Decision Trees, and k-Nearest Neighbors). After optimizing hyperparameters, the ensembled models, which were implemented using a soft voting classifier, accurately differentiated between patients and healthy individuals. We have uncovered 18 transcriptomic biomarkers that are highly significant in the CVD population that were used to predict disease with up to 96% accuracy. Additionally, we cross-validated our results with clinical records collected from patients in our cohort. The identified biomarkers served as potential indicators for early detection of CVDs. With its successful implementation, our newly developed predictive engine provides a valuable framework for identifying patients with CVDs based on their biomarker profiles.
引用
收藏
页数:13
相关论文
共 63 条
[1]   Artificial Intelligence, Healthcare, Clinical Genomics, and Pharmacogenomics Approaches in Precision Medicine [J].
Abdelhalim, Habiba ;
Berber, Asude ;
Lodi, Mudassir ;
Jain, Rihi ;
Nair, Achuth ;
Pappu, Anirudh ;
Patel, Kush ;
Venkat, Vignesh ;
Venkatesan, Cynthia ;
Wable, Raghu ;
Dinatale, Matthew ;
Fu, Allyson ;
Iyer, Vikram ;
Kalove, Ishan ;
Kleyman, Marc ;
Koutsoutis, Joseph ;
Menna, David ;
Paliwal, Mayank ;
Patel, Nishi ;
Patel, Thirth ;
Rafique, Zara ;
Samadi, Rothela ;
Varadhan, Roshan ;
Bolla, Shreyas ;
Vadapalli, Sreya ;
Ahmed, Zeeshan .
FRONTIERS IN GENETICS, 2022, 13
[2]   Artificial intelligence with multi-functional machine learning platform development for better healthcare and precision medicine [J].
Ahmed, Zeeshan ;
Mohamed, Khalid ;
Zeeshan, Saman ;
Dong, Xinqi .
DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2020,
[3]   RNA-seq driven expression and enrichment analysis to investigate CVD genes with associated phenotypes among high-risk heart failure patients [J].
Ahmed, Zeeshan ;
Zeeshan, Saman ;
Liang, Bruce T. .
HUMAN GENOMICS, 2021, 15 (01)
[4]   Genomics pipelines to investigate susceptibility in whole genome and exome sequenced data for variant discovery, annotation, prediction and genotyping [J].
Ahmed, Zeeshan ;
Renart, Eduard Gibert ;
Zeeshan, Saman .
PEERJ, 2021, 9
[5]   Promoter hypomethylation and overexpression of TSTD1 mediate poor treatment response in breast cancer [J].
Ansar, Muhamad ;
Thu, Le Thi Anh ;
Hung, Chin-Sheng ;
Su, Chih-Ming ;
Huang, Man-Hsu ;
Liao, Li-Min ;
Chung, Yu-Mei ;
Lin, Ruo-Kai .
FRONTIERS IN ONCOLOGY, 2022, 12
[6]   Precision medicine in cardiology [J].
Antman, Elliott M. ;
Loscalzo, Joseph .
NATURE REVIEWS CARDIOLOGY, 2016, 13 (10) :591-602
[7]   Small sample size is not the real problem [J].
Bacchetti, Peter .
NATURE REVIEWS NEUROSCIENCE, 2013, 14 (08) :585-585
[8]   Crohn's disease [J].
Baumgart, Daniel C. ;
Sandborn, William J. .
LANCET, 2012, 380 (9853) :1590-1605
[9]   Gene panel sequencing improves the diagnostic work-up of patients with idiopathic erythrocytosis and identifies new mutations [J].
Camps, Carme ;
Petousi, Nayia ;
Bento, Celeste ;
Cario, Holger ;
Copley, Richard R. ;
McMullin, Mary Frances ;
van Wijk, Richard ;
Ratcliffe, Peter J. ;
Robbins, Peter A. ;
Taylor, Jenny C. .
HAEMATOLOGICA, 2016, 101 (11) :1306-1318
[10]   Pathophysiology, clinical presentation, and management of colon cancer [J].
Cappell, Mitchell S. .
GASTROENTEROLOGY CLINICS OF NORTH AMERICA, 2008, 37 (01) :1-+