Transfer learning enables predictions in network biology

被引:226
作者
Theodoris, Christina V. [1 ,2 ,3 ,4 ]
Xiao, Ling [2 ,5 ]
Chopra, Anant [6 ]
Chaffin, Mark D. [2 ]
Al Sayed, Zeina R. [2 ]
Hill, Matthew C. [2 ,5 ]
Mantineo, Helene [2 ,5 ]
Brydon, Elizabeth M. [6 ]
Zeng, Zexian [1 ,7 ]
Liu, X. Shirley [1 ,7 ,8 ]
Ellinor, Patrick T. [2 ,5 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02215 USA
[2] Broad Inst MIT & Harvard, Cardiovasc Dis Initiat & Precis Cardiol Lab, Cambridge, MA 02142 USA
[3] Boston Childrens Hosp, Div Genet & Genom, Boston, MA 02115 USA
[4] Harvard Med Sch, Genet Training Program, Boston, MA 02115 USA
[5] Massachusetts Gen Hosp, Cardiovasc Res Ctr, Boston, MA 02114 USA
[6] Bayer US LLC, Precis Cardiol Lab, Cambridge, MA USA
[7] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[8] Dana Farber Canc Inst, Ctr Funct Canc Epigenet, Boston, MA USA
基金
美国国家卫生研究院;
关键词
SINGLE-CELL TRANSCRIPTOME; IN-VITRO; DIFFERENTIATION; MUTATIONS; GENES; HETEROGENEITY; TRAJECTORIES; LANDSCAPE; ORGANOIDS; SUBSETS;
D O I
10.1038/s41586-023-06139-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
引用
收藏
页码:616 / 624
页数:32
相关论文
共 159 条
  • [31] Single-Cell Heterogeneity Analysis and CRISPR Screen Identify Key β-Cell-Specific Disease Genes
    Fang, Zhou
    Weng, Chen
    Li, Haiyan
    Tao, Ran
    Mai, Weihua
    Liu, Xiaoxiao
    Lu, Leina
    Lai, Sisi
    Duan, Qing
    Alvarez, Carlos
    Arvan, Peter
    Wynshaw-Boris, Anthony
    Li, Yun
    Pei, Yanxin
    Jin, Fulai
    Li, Yan
    [J]. CELL REPORTS, 2019, 26 (11): : 3132 - +
  • [32] PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data
    Franzen, Oscar
    Gan, Li-Ming
    Bjorkegren, Johan L. M.
    [J]. DATABASE-THE JOURNAL OF BIOLOGICAL DATABASES AND CURATION, 2019,
  • [33] A Hierarchy of Proliferative and Migratory Keratinocytes Maintains the Tympanic Membrane
    Frumm, Stacey M.
    Yu, Shengyang Kevin
    Chang, Joseph
    Artichoker, Jordan A.
    Scaria, Sonia M.
    Lee, Katharine P.
    Byrnes, Lauren E.
    Sneddon, Julie B.
    Tward, Aaron D.
    [J]. CELL STEM CELL, 2021, 28 (02) : 315 - +
  • [34] Aryl Hydrocarbon Receptor Controls Monocyte Differentiation into Dendritic Cells versus Macrophages
    Goudot, Christel
    Coillard, Alice
    Villani, Alexandra-Chloe
    Gueguen, Paul
    Cros, Adeline
    Sarkizova, Siranush
    Tang-Huau, Tsing-Lee
    Bohec, Mylene
    Baulande, Sylvain
    Hacohen, Nir
    Amigorena, Sebastian
    Segura, Elodie
    [J]. IMMUNITY, 2017, 47 (03) : 582 - +
  • [35] An Integrated Gene Expression Landscape Profiling Approach to Identify Lung Tumor Endothelial Cell Heterogeneity and Angiogenic Candidates
    Goveia, Jermaine
    Rohlenova, Katerina
    Taverna, Federico
    Treps, Lucas
    Conradi, Lena-Christin
    Pircher, Andreas
    Geldhof, Vincent
    de Rooij, Laura P. M. H.
    Kalucka, Joanna
    Sokol, Liliana
    Garcia-Caballero, Melissa
    Zheng, Yingfeng
    Qian, Junbin
    Teuwen, Laure-Anne
    Khan, Shawez
    Boeckx, Bram
    Wauters, Els
    Decaluwe, Herbert
    De Leyn, Paul
    Vansteenkiste, Johan
    Weynand, Birgit
    Sagaert, Xavier
    Verbeken, Erik
    Wolthuis, Albert
    Topal, Baki
    Everaert, Wouter
    Bohnenberger, Hanibal
    Emmert, Alexander
    Panovska, Dena
    De Smet, Frederik
    Staal, Frank J. T.
    Mclaughlin, Rene J.
    Impens, Francis
    Lagani, Vincenzo
    Vinckier, Stefan
    Mazzone, Massimiliano
    Schoonjans, Luc
    Dewerchin, Mieke
    Eelen, Guy
    Karakach, Tobias K.
    Yang, Huanming
    Wang, Jian
    Bolund, Lars
    Lin, Lin
    Thienpont, Bernard
    Li, Xuri
    Lambrechts, Diether
    Luo, Yonglun
    Carmeliet, Peter
    [J]. CANCER CELL, 2020, 37 (01) : 21 - +
  • [36] Guo DS, 2022, ELIFE, V11, DOI [10.7554/eLife.70341, 10.7554/eLife.70341.sa0, 10.7554/eLife.70341.sa1, 10.7554/eLife.70341.sa2]
  • [37] The Dynamic Transcriptional Cell Atlas of Testis Development during Human Puberty
    Guo, Jingtao
    Nie, Xichen
    Giebler, Maria
    Mlcochova, Hana
    Wang, Yueqi
    Grow, Edward J.
    DonorConnect
    Kim, Robin
    Tharmalingam, Melissa
    Matilionyte, Gabriele
    Lindskog, Cecilia
    Carrell, Douglas T.
    Mitchell, Rod T.
    Goriely, Anne
    Hotaling, James M.
    Cairns, Bradley R.
    [J]. CELL STEM CELL, 2020, 26 (02) : 262 - +
  • [38] The adult human testis transcriptional cell atlas
    Guo, Jingtao
    Grow, Edward J.
    Mlcochova, Hana
    Maher, Geoffrey J.
    Lindskog, Cecilia
    Nie, Xichen
    Guo, Yixuan
    Takei, Yodai
    Yun, Jina
    Cai, Long
    Kim, Robin
    Carrell, Douglas T.
    Goriely, Anne
    Hotaling, James M.
    Cairns, Bradley R.
    [J]. CELL RESEARCH, 2018, 28 (12) : 1141 - 1157
  • [39] Single-cell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis
    Habermann, Arun C.
    Gutierrez, Austin J.
    Bui, Linh T.
    Yahn, Stephanie L.
    Winters, Nichelle, I
    Calvi, Carla L.
    Peter, Lance
    Chung, Mei-, I
    Taylor, Chase J.
    Jetter, Christopher
    Raju, Latha
    Roberson, Jamie
    Ding, Guixiao
    Wood, Lori
    Sucre, Jennifer M. S.
    Richmond, Bradley W.
    Serezani, Ana P.
    McDonnell, Wyatt J.
    Mallal, Simon B.
    Bacchetta, Matthew J.
    Loyd, James E.
    Shaver, Ciara M.
    Ware, Lorraine B.
    Bremner, Ross
    at Walia, Raj
    Blackwell, Timothy S.
    Banovich, Nicholas E.
    Kropski, Jonathan A.
    [J]. SCIENCE ADVANCES, 2020, 6 (28)
  • [40] CCR10+ epithelial cells from idiopathic pulmonary fibrosis lungs drive remodeling
    Habiel, David M.
    Espindola, Milena S.
    Jones, Isabelle C.
    Coelho, Ana Lucia
    Stripp, Barry
    Hogaboam, Cory M.
    [J]. JCI INSIGHT, 2018, 3 (16):