Machine learning based refined differential gene expression analysis of pediatric sepsis

被引:39
作者
Abbas, Mostafa [1 ]
EL-Manzalawy, Yasser [1 ,2 ]
机构
[1] Geisinger Hlth Syst, Dept Imaging Sci & Innovat, Danville, PA 17822 USA
[2] Geisinger Hlth Syst, Dept Biomed & Translat Informat, Danville, PA 17822 USA
关键词
Biomarkers discovery; Differential expression analysis; Refined differential gene expression analysis; Feature selection; ALGORITHMS;
D O I
10.1186/s12920-020-00771-4
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Background Differential expression (DE) analysis of transcriptomic data enables genome-wide analysis of gene expression changes associated with biological conditions of interest. Such analysis often provides a wide list of genes that are differentially expressed between two or more groups. In general, identified differentially expressed genes (DEGs) can be subject to further downstream analysis for obtaining more biological insights such as determining enriched functional pathways or gene ontologies. Furthermore, DEGs are treated as candidate biomarkers and a small set of DEGs might be identified as biomarkers using either biological knowledge or data-driven approaches. Methods In this work, we present a novel approach for identifying biomarkers from a list of DEGs by re-ranking them according to the Minimum Redundancy Maximum Relevance (MRMR) criteria using repeated cross-validation feature selection procedure. Results Using gene expression profiles for 199 children with sepsis and septic shock, we identify 108 DEGs and propose a 10-gene signature for reliably predicting pediatric sepsis mortality with an estimated Area Under ROC Curve (AUC) score of 0.89. Conclusions Machine learning based refinement of DE analysis is a promising tool for prioritizing DEGs and discovering biomarkers from gene expression profiles. Moreover, our reported 10-gene signature for pediatric sepsis mortality may facilitate the development of reliable diagnosis and prognosis biomarkers for sepsis.
引用
收藏
页数:10
相关论文
共 39 条
[1]  
Abbas Mostafa, 2019, PLOS ONE, V14
[2]   Precision medicine in pediatric sepsis [J].
Atreya, Mihir R. ;
Wong, Hector R. .
CURRENT OPINION IN PEDIATRICS, 2019, 31 (03) :322-327
[3]   Assessing the accuracy of prediction algorithms for classification: an overview [J].
Baldi, P ;
Brunak, S ;
Chauvin, Y ;
Andersen, CAF ;
Nielsen, H .
BIOINFORMATICS, 2000, 16 (05) :412-424
[4]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]  
Burrell AnthonyR, 2016, MED J AUSTRALIA, P73
[7]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[8]   Biomarker Panels in Critical Care [J].
Conway, Susan R. ;
Wong, Hector R. .
CRITICAL CARE CLINICS, 2020, 36 (01) :89-+
[9]  
Deepti K, 2019, INT J PHARM PHYTOPHA, V9, P1
[10]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528