Transfer learning enables predictions in network biology

被引:226
作者
Theodoris, Christina V. [1 ,2 ,3 ,4 ]
Xiao, Ling [2 ,5 ]
Chopra, Anant [6 ]
Chaffin, Mark D. [2 ]
Al Sayed, Zeina R. [2 ]
Hill, Matthew C. [2 ,5 ]
Mantineo, Helene [2 ,5 ]
Brydon, Elizabeth M. [6 ]
Zeng, Zexian [1 ,7 ]
Liu, X. Shirley [1 ,7 ,8 ]
Ellinor, Patrick T. [2 ,5 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02215 USA
[2] Broad Inst MIT & Harvard, Cardiovasc Dis Initiat & Precis Cardiol Lab, Cambridge, MA 02142 USA
[3] Boston Childrens Hosp, Div Genet & Genom, Boston, MA 02115 USA
[4] Harvard Med Sch, Genet Training Program, Boston, MA 02115 USA
[5] Massachusetts Gen Hosp, Cardiovasc Res Ctr, Boston, MA 02114 USA
[6] Bayer US LLC, Precis Cardiol Lab, Cambridge, MA USA
[7] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[8] Dana Farber Canc Inst, Ctr Funct Canc Epigenet, Boston, MA USA
基金
美国国家卫生研究院;
关键词
SINGLE-CELL TRANSCRIPTOME; IN-VITRO; DIFFERENTIATION; MUTATIONS; GENES; HETEROGENEITY; TRAJECTORIES; LANDSCAPE; ORGANOIDS; SUBSETS;
D O I
10.1038/s41586-023-06139-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
引用
收藏
页码:616 / 624
页数:32
相关论文
共 159 条
  • [1] 10xgenomics, 10X GENOMICS DATASE
  • [2] Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis
    Adams, Taylor S.
    Schupp, Jonas C.
    Poli, Sergio
    Ayaub, Ehab A.
    Neumark, Nir
    Ahangari, Farida
    Chu, Sarah G.
    Raby, Benjamin A.
    DeTullis, Giuseppe
    Januszyk, Michael
    Duan, Qiaonan
    Arnett, Heather A.
    Siddiqui, Asim
    Washko, George R.
    Homer, Robert
    Yan, Xiting
    Rosas, Ivan O.
    Kaminski, Naftali
    [J]. SCIENCE ADVANCES, 2020, 6 (28)
  • [3] A single-cell atlas of the human substantia nigra reveals cell-specific pathways associated with neurological disorders
    Agarwal, Devika
    Sandor, Cynthia
    Volpato, Viola
    Caffrey, Tara M.
    Monzon-Sandoval, Jimena
    Bowden, Rory
    Alegre-Abarrategui, Javier
    Wade-Martins, Richard
    Webber, Caleb
    [J]. NATURE COMMUNICATIONS, 2020, 11 (01)
  • [4] Roles of cardiac transcription factors in cardiac hypertrophy
    Akazawa, H
    Komuro, I
    [J]. CIRCULATION RESEARCH, 2003, 92 (10) : 1079 - 1088
  • [5] NKX2-5 regulates human cardiomyogenesis via a HEY2 dependent transcriptional network
    Anderson, David J.
    Kaplan, David I.
    Bell, Katrina M.
    Koutsis, Katerina
    Haynes, John M.
    Mills, Richard J.
    Phelan, Dean G.
    Qian, Elizabeth L.
    Leitoguinho, Ana Rita
    Arasaratnam, Deevina
    Labonne, Tanya
    Ng, Elizabeth S.
    Davis, Richard P.
    Casini, Simona
    Passier, Robert
    Hudson, James E.
    Porrello, Enzo R.
    Costa, Mauro W.
    Rafii, Arash
    Curl, Clare L.
    Delbridge, Lea M.
    Harvey, Richard P.
    Oshlack, Alicia
    Cheung, Michael M.
    Mummery, Christine L.
    Petrou, Stephen
    Elefanty, Andrew G.
    Stanley, Edouard G.
    Elliott, David A.
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [6] Disease Model of GATA4 Mutation Reveals Transcription Factor Cooperativity in Human Cardiogenesis
    Ang, Yen-Sin
    Rivas, Renee N.
    Ribeiro, Alexandre J. S.
    Srivas, Rohith
    Rivera, Janell
    Stone, Nicole R.
    Pratt, Karishma
    Mohamed, Tamer M. A.
    Fu, Ji-Dong
    Spencer, C. Ian
    Tippens, Nathaniel D.
    Li, Molong
    Narasimha, Anil
    Radzinsky, Ethan
    Moon-Grady, Anita J.
    Yu, Haiyuan
    Pruitt, Beth L.
    Snyder, Michael P.
    Srivastava, Deepak
    [J]. CELL, 2016, 167 (07) : 1734 - +
  • [7] [Anonymous], GEN DAT
  • [8] Changes in bone marrow innate lymphoid cell subsets in monoclonal gammopathy: target for IMiD therapy
    Bailur, Jithendra Kini
    Mehta, Sameet
    Zhang, Lin
    Neparidze, Natalia
    Parker, Terri
    Bar, Noffar
    Anderson, Tara
    Xu, Mina L.
    Dhodapkar, Kavita M.
    Dhodapkar, Madhav V.
    [J]. BLOOD ADVANCES, 2017, 1 (25) : 2343 - 2347
  • [9] Large-Scale Human Dendritic Cell Differentiation Revealing Notch-Dependent Lineage Bifurcation and Heterogeneity
    Balan, Sreekumar
    Arnold-Schrauf, Catharina
    Abbas, Abdenour
    Couespel, Norbert
    Savoret, Juliette
    Imperatore, Francesco
    Villani, Alexandra-Chloe
    Thien-Phong Vu Manh
    Bhardwaj, Nina
    Dalod, Marc
    [J]. CELL REPORTS, 2018, 24 (07): : 1902 - +
  • [10] A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure
    Baron, Maayan
    Veres, Adrian
    Wolock, Samuel L.
    Faust, Aubrey L.
    Gaujoux, Renaud
    Vetere, Amedeo
    Ryu, Jennifer Hyoje
    Wagner, Bridget K.
    Shen-Orr, Shai S.
    Klein, Allon M.
    Melton, Douglas A.
    Yanai, Itai
    [J]. CELL SYSTEMS, 2016, 3 (04) : 346 - +