MultiPLIER: A Transfer Learning Framework for Transcriptomics Reveals Systemic Features of Rare Disease

被引:81
作者
Taroni, Jaclyn N. [1 ,2 ]
Grayson, Peter C. [3 ]
Hu, Qiwen [1 ]
Eddy, Sean [4 ]
Kretzler, Matthias [4 ,5 ]
Merkel, Peter A. [6 ,7 ]
Greene, Casey S. [1 ,2 ,8 ,9 ]
机构
[1] Univ Penn, Syst Pharmacol & Translat Therapeut, Philadelphia, PA 19104 USA
[2] Alexs Lemonade Stand Fdn, Childhood Canc Data Lab, Philadelphia, PA 19004 USA
[3] NIAMSD, NIH, Bethesda, MD 20892 USA
[4] Michigan Med, Dept Internal Med, Div Nephrol, Ann Arbor, MI USA
[5] Michigan Med, Dept Computat Med & Bioinformat, Ann Arbor, MI USA
[6] Univ Penn, Div Rheumatol, Philadelphia, PA 19104 USA
[7] Univ Penn, Dept Biostat Epidemiol & Informat, Philadelphia, PA 19104 USA
[8] Univ Penn, Inst Translat Med & Therapeut, Philadelphia, PA 19104 USA
[9] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
基金
美国国家卫生研究院;
关键词
GENE-EXPRESSION; MODULAR ANALYSIS; MEDULLOBLASTOMA; GENERATION; SUBGROUPS; GENOMICS;
D O I
10.1016/j.cels.2019.04.003
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Most gene expression datasets generated by individual researchers are too small to fully benefit from unsupervised machine-learning methods. In the case of rare diseases, there may be too few cases available, even when multiple studies are combined. To address this challenge, we utilize transfer learning to extract coordinated expression patterns and use learned patterns to analyze small rare disease datasets. We trained a pathway-level information extractor (PLIER) model on a large public data compendium comprising multiple experiments, tissues, and biological conditions and then transferred the model to small datasets in an approach we call MultiPLIER. Models constructed from the public data compendium included features that aligned well to known biological factors and were more comprehensive than those constructed from individual datasets or conditions. When transferred to rare disease datasets, the models describe biological processes related to disease severity more effectively than models trained only on a given dataset.
引用
收藏
页码:380 / +
页数:19
相关论文
共 53 条
[1]   Immune response in silico (IRIS): immune-specific genes identified from a compendium of microarray expression data [J].
Abbas, AR ;
Baldwin, D ;
Ma, Y ;
Ouyang, W ;
Gurney, A ;
Martin, F ;
Fong, S ;
Campagne, MV ;
Godowski, P ;
Williams, PM ;
Chan, AC ;
Clark, HF .
GENES AND IMMUNITY, 2005, 6 (04) :319-331
[2]   A Generalized Least-Square Matrix Decomposition [J].
Allen, Genevera I. ;
Grosenick, Logan ;
Taylor, Jonathan .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2014, 109 (505) :145-159
[3]   Personalized Immunomonitoring Uncovers Molecular Networks that Stratify Lupus Patients [J].
Banchereau, Romain ;
Hong, Seunghee ;
Cantarel, Brandi ;
Baldwin, Nicole ;
Baisch, Jeanine ;
Edens, Michelle ;
Cepika, Alma-Martina ;
Acs, Peter ;
Turner, Jacob ;
Anguiano, Esperanza ;
Vinod, Parvathi ;
Kahn, Shaheen ;
Obermoser, Gerlinde ;
Blankenship, Derek ;
Wakeland, Edward ;
Nassi, Lorien ;
Gotte, Alisa ;
Punaro, Marilynn ;
Liu, Yong-Jun ;
Banchereau, Jacques ;
Rossello-Urgell, Jose ;
Wright, Tracey ;
Pascual, Virginia .
CELL, 2016, 165 (03) :551-565
[4]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[5]   Estimating the population abundance of tissue-infiltrating immune and stromal cell populations using gene expression [J].
Becht, Etienne ;
Giraldo, Nicolas A. ;
Lacroix, Laetitia ;
Buttard, Benedicte ;
Elarouci, Nabila ;
Petitprez, Florent ;
Selves, Janick ;
Laurent-Puig, Pierre ;
Sautes-Fridman, Catherine ;
Fridman, Wolf H. ;
de Reynies, Aurelien .
GENOME BIOLOGY, 2016, 17
[6]   Neutrophils, from Marrow to Microbes [J].
Borregaard, Niels .
IMMUNITY, 2010, 33 (05) :657-670
[7]   A modular analysis framework for blood genomics studies: Application to systemic lupus erythematosus [J].
Chaussabel, Damien ;
Quinn, Charles ;
Shen, Jing ;
Patel, Pinakeen ;
Glaser, Casey ;
Baldwin, Nicole ;
Stichweh, Dorothee ;
Blankenship, Derek ;
Li, Lei ;
Munagala, Indira ;
Bennett, Lynda ;
Allantaz, Florence ;
Mejias, Asuncion ;
Ardura, Monica ;
Kaizer, Ellen ;
Monnet, Laurence ;
Allman, Windy ;
Randall, Henry ;
Johnson, Diane ;
Lanier, Aimee ;
Punaro, Marilynn ;
Wittkowski, Knut M. ;
White, Perrin ;
Fay, Joseph ;
Klintmalm, Goran ;
Ramilo, Octavio ;
Palucka, A. Karolina ;
Banchereau, Jacques ;
Pascual, Virginia .
IMMUNITY, 2008, 29 (01) :150-164
[8]   Transcription of Proteinase 3 and Related Myelopoiesis Genes in Peripheral Blood Mononuclear Cells of Patients With Active Wegener's Granulomatosis [J].
Cheadle, Chris ;
Berger, Alan E. ;
Andrade, Felipe ;
James, Regina ;
Johnson, Kristen ;
Watkins, Tonya ;
Park, Jin Kyun ;
Chen, Yu-Chi ;
Ehrlich, Eva ;
Mullins, Marissa ;
Chrest, Francis ;
Barnes, Kathleen C. ;
Levine, Stuart M. .
ARTHRITIS AND RHEUMATISM, 2010, 62 (06) :1744-1754
[9]   Modular Transcriptional Repertoire Analyses of Adults With Systemic Lupus Erythematosus Reveal Distinct Type I and Type II Interferon Signatures [J].
Chiche, Laurent ;
Jourde-Chiche, Noemie ;
Whalen, Elizabeth ;
Presnell, Scott ;
Gersuk, Vivian ;
Dang, Kristen ;
Anguiano, Esperanza ;
Quinn, Charlie ;
Burtey, Stephane ;
Berland, Yvon ;
Kaplanski, Gilles ;
Harle, Jean-Robert ;
Pascual, Virginia ;
Chaussabel, Damien .
ARTHRITIS & RHEUMATOLOGY, 2014, 66 (06) :1583-1595
[10]   CellCODE: a robust latent variable approach to differential expression analysis for heterogeneous cell populations [J].
Chikina, Maria ;
Zaslavsky, Elena ;
Sealfon, Stuart C. .
BIOINFORMATICS, 2015, 31 (10) :1584-1591