A biological network-based regularized artificial neural network model for robust phenotype prediction from gene expression data

被引:29
作者
Kang, Tianyu [1 ]
Ding, Wei [1 ]
Zhang, Luoyan [1 ]
Ziemek, Daniel [2 ]
Zarringhalam, Kourosh [3 ]
机构
[1] Univ Massachusetts Boston, Dept Comp Sci, 100 Morrissey Blvd, Boston, MA 02125 USA
[2] Pfizer Worldwide Res & Dev, Inflammat & Immunol, Berlin, Germany
[3] Univ Massachusetts Boston, Dept Math, 100 Morrissey Blvd, Boston, MA 02125 USA
基金
美国国家科学基金会;
关键词
Artificial neural network; Gene regulatory networks; Prediction of response; Clinical trial; Group Lasso; CANCER; SELECTION; DISEASE; RARE; CLASSIFICATION; SCHIZOPHRENIA; DIAGNOSIS;
D O I
10.1186/s12859-017-1984-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Stratification of patient subpopulations that respond favorably to treatment or experience and adverse reaction is an essential step toward development of new personalized therapies and diagnostics. It is currently feasible to generate omic-scale biological measurements for all patients in a study, providing an opportunity for machine learning models to identify molecular markers for disease diagnosis and progression. However, the high variability of genetic background in human populations hampers the reproducibility of omic-scale markers. In this paper, we develop a biological network-based regularized artificial neural network model for prediction of phenotype from transcriptomic measurements in clinical trials. To improve model sparsity and the overall reproducibility of the model, we incorporate regularization for simultaneous shrinkage of gene sets based on active upstream regulatory mechanisms into the model. Results: We benchmark our method against various regression, support vector machines and artificial neural network models and demonstrate the ability of our method in predicting the clinical outcomes using clinical trial data on acute rejection in kidney transplantation and response to Infliximab in ulcerative colitis. We show that integration of prior biological knowledge into the classification as developed in this paper, significantly improves the robustness and generalizability of predictions to independent datasets. We provide a Java code of our algorithm along with a parsed version of the STRING DB database. Conclusion: In summary, we present a method for prediction of clinical phenotypes using baseline genome-wide expression data that makes use of prior biological knowledge on gene-regulatory interactions in order to increase robustness and reproducibility of omic-scale markers. The integrated group-wise regularization methods increases the interpretability of biological signatures and gives stable performance estimates across independent test sets.
引用
收藏
页数:11
相关论文
共 48 条
[1]   A map of human genome variation from population-scale sequencing [J].
Altshuler, David ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Collins, Francis S. ;
De la Vega, Francisco M. ;
Donnelly, Peter ;
Egholm, Michael ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Knoppers, Bartha M. ;
Lander, Eric S. ;
Lehrach, Hans ;
Mardis, Elaine R. ;
McVean, Gil A. ;
Nickerson, DebbieA. ;
Peltonen, Leena ;
Schafer, Alan J. ;
Sherry, Stephen T. ;
Wang, Jun ;
Wilson, Richard K. ;
Gibbs, Richard A. ;
Deiros, David ;
Metzker, Mike ;
Muzny, Donna ;
Reid, Jeff ;
Wheeler, David ;
Wang, Jun ;
Li, Jingxiang ;
Jian, Min ;
Li, Guoqing ;
Li, Ruiqiang ;
Liang, Huiqing ;
Tian, Geng ;
Wang, Bo ;
Wang, Jian ;
Wang, Wei ;
Yang, Huanming ;
Zhang, Xiuqing ;
Zheng, Huisong ;
Lander, Eric S. ;
Altshuler, David L. ;
Ambrogio, Lauren ;
Bloom, Toby ;
Cibulskis, Kristian ;
Fennell, Tim J. ;
Gabriel, Stacey B. .
NATURE, 2010, 467 (7319) :1061-1073
[2]  
[Anonymous], 2009, P 26 ANN INT C MACH
[3]  
[Anonymous], 2006, Journal of the Royal Statistical Society, Series B
[4]   Mucosal gene signatures to predict response to infliximab in patients with ulcerative colitis [J].
Arijs, I. ;
Li, K. ;
Toedter, G. ;
Quintens, R. ;
Van Lommel, L. ;
Van Steen, K. ;
Leemans, P. ;
De Hertogh, G. ;
Lemaire, K. ;
Ferrante, M. ;
Schnitzler, F. ;
Thorrez, L. ;
Ma, K. ;
Song, X. -Y R. ;
Marano, C. ;
Van Assche, G. ;
Vermeire, S. ;
Geboes, K. ;
Schuit, F. ;
Baribaud, F. ;
Rutgeerts, P. .
GUT, 2009, 58 (12) :1612-1619
[5]   Incorporating pathway information into boosting estimation of high-dimensional risk prediction models [J].
Binder, Harald ;
Schumacher, Martin .
BMC BIOINFORMATICS, 2009, 10
[6]   Leucine-rich repeat kinase 2 deficiency is protective in rhabdomyolysis-induced kidney injury [J].
Boddu, Ravindra ;
Hull, Travis D. ;
Bolisetty, Subhashini ;
Hu, Xianzhen ;
Moehle, Mark S. ;
Daher, Joao Paulo Lima ;
Kamal, Ahmed Ibrahim ;
Joseph, Reny ;
George, James F. ;
Agarwal, Anupam ;
Curtis, Lisa M. ;
West, Andrew B. .
HUMAN MOLECULAR GENETICS, 2015, 24 (14) :4078-4093
[7]   Inflammatory bowel disease and the apical junctional complex [J].
Bruewer, Matthias ;
Samarin, Stanislav ;
Nusrat, Asma .
INFLAMMATORY BOWEL DISEASE: GENETICS, BARRIER FUNCTION, IMMUNOLOGIC MECHANISMS, AND MICROBIAL PATHWAYS, 2006, 1072 :242-252
[8]   Characterization of microRNAs in serum: a novel class of biomarkers for diagnosis of cancer and other diseases [J].
Chen, Xi ;
Ba, Yi ;
Ma, Lijia ;
Cai, Xing ;
Yin, Yuan ;
Wang, Kehui ;
Guo, Jigang ;
Zhang, Yujing ;
Chen, Jiangning ;
Guo, Xing ;
Li, Qibin ;
Li, Xiaoying ;
Wang, Wenjing ;
Zhang, Yan ;
Wang, Jin ;
Jiang, Xueyuan ;
Xiang, Yang ;
Xu, Chen ;
Zheng, Pingping ;
Zhang, Juanbin ;
Li, Ruiqiang ;
Zhang, Hongjie ;
Shang, Xiaobin ;
Gong, Ting ;
Ning, Guang ;
Wang, Jun ;
Zen, Ke ;
Zhang, Junfeng ;
Zhang, Chen-Yu .
CELL RESEARCH, 2008, 18 (10) :997-1006
[9]  
CHO WCS, 2007, MOL CANCER, V6
[10]   Network-based classification of breast cancer metastasis [J].
Chuang, Han-Yu ;
Lee, Eunjung ;
Liu, Yu-Tsueng ;
Lee, Doheon ;
Ideker, Trey .
MOLECULAR SYSTEMS BIOLOGY, 2007, 3 (1)