Transfer learning enables predictions in network biology

被引:226
作者
Theodoris, Christina V. [1 ,2 ,3 ,4 ]
Xiao, Ling [2 ,5 ]
Chopra, Anant [6 ]
Chaffin, Mark D. [2 ]
Al Sayed, Zeina R. [2 ]
Hill, Matthew C. [2 ,5 ]
Mantineo, Helene [2 ,5 ]
Brydon, Elizabeth M. [6 ]
Zeng, Zexian [1 ,7 ]
Liu, X. Shirley [1 ,7 ,8 ]
Ellinor, Patrick T. [2 ,5 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02215 USA
[2] Broad Inst MIT & Harvard, Cardiovasc Dis Initiat & Precis Cardiol Lab, Cambridge, MA 02142 USA
[3] Boston Childrens Hosp, Div Genet & Genom, Boston, MA 02115 USA
[4] Harvard Med Sch, Genet Training Program, Boston, MA 02115 USA
[5] Massachusetts Gen Hosp, Cardiovasc Res Ctr, Boston, MA 02114 USA
[6] Bayer US LLC, Precis Cardiol Lab, Cambridge, MA USA
[7] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[8] Dana Farber Canc Inst, Ctr Funct Canc Epigenet, Boston, MA USA
基金
美国国家卫生研究院;
关键词
SINGLE-CELL TRANSCRIPTOME; IN-VITRO; DIFFERENTIATION; MUTATIONS; GENES; HETEROGENEITY; TRAJECTORIES; LANDSCAPE; ORGANOIDS; SUBSETS;
D O I
10.1038/s41586-023-06139-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
引用
收藏
页码:616 / 624
页数:32
相关论文
共 159 条
  • [81] Mende, QUANTITATIVE MOL DIF, DOI [10.1101/2020.01.26.919753, DOI 10.1101/2020.01.26.919753]
  • [82] Single-cell transcriptomic atlas of the human retina identifies cell types associated with age-related macular degeneration
    Menon, Madhvi
    Mohammadi, Shahin
    Davila-Velderrain, Jose
    Goods, Brittany A.
    Cadwell, Tanina D.
    Xing, Yu
    Stemmer-Rachamimov, Anat
    Shalek, Alex K.
    Love, John Christopher
    Kellis, Manolis
    Hafler, Brian P.
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [83] Single-cell analysis of progenitor cell dynamics and lineage specification in the human fetal kidney
    Menon, Rajasree
    Otto, Edgar A.
    Kokoruda, Austin
    Zhou, Wan
    Zhang, Zidong
    Yoon, Euisik
    Chen, Yu-Chih
    Troyanskaya, Olga
    Spence, Jason R.
    Kretzler, Matthias
    Cebrian, Cristina
    [J]. DEVELOPMENT, 2018, 145 (16):
  • [84] Identification of a mesenchymal progenitor cell hierarchy in adipose tissue
    Merrick, David
    Sakers, Alexander
    Irgebay, Zhazira
    Okada, Chihiro
    Calvert, Catherine
    Morley, Michael P.
    Percec, Ivona
    Seale, Patrick
    [J]. SCIENCE, 2019, 364 (6438) : 353 - +
  • [85] Miller AJ, 2020, DEV CELL, V53, P117, DOI [10.1016/j.devcel.2020.09.012, 10.1016/j.devcel.2020.01.033]
  • [86] Human germinal center transcriptional programs are de-synchronized in B cell lymphoma
    Milpied, Pierre
    Cervera-Marzal, Inaki
    Mollichella, Marie-Laure
    Tesson, Bruno
    Brisou, Gabriel
    Traverse-Glehen, Alexandra
    Salles, Gilles
    Spinelli, Lionel
    Nadel, Bertrand
    [J]. NATURE IMMUNOLOGY, 2018, 19 (09) : 1013 - +
  • [87] Paracrine signalling by cardiac calcitonin controls atrial fibrogenesis and arrhythmia
    Moreira, Lucia M.
    Takawale, Abhijit
    Hulsurkar, Mohit
    Menassa, David A.
    Antanaviciute, Agne
    Lahiri, Satadru K.
    Mehta, Neelam
    Evans, Neil
    Psarros, Constantinos
    Robinson, Paul
    Sparrow, Alexander J.
    Gillis, Marc-Antoine
    Ashley, Neil
    Naud, Patrice
    Barallobre-Barreiro, Javier
    Theofilatos, Konstantinos
    Lee, Angela
    Norris, Mary
    Clarke, Michele V.
    Russell, Patricia K.
    Casadei, Barbara
    Bhattacharya, Shoumo
    Zajac, Jeffrey D.
    Davey, Rachel A.
    Sirois, Martin
    Mead, Adam
    Simmons, Alison
    Mayr, Manuel
    Sayeed, Rana
    Krasopoulos, George
    Redwood, Charles
    Channon, Keith M.
    Tardif, Jean-Claude
    Wehrens, Xander H. T.
    Nattel, Stanley
    Reilly, Svetlana
    [J]. NATURE, 2020, 587 (7834) : 460 - +
  • [88] Centronuclear myopathy in mice lacking a novel muscle-specific protein kinase transcriptionally regulated by MEF2
    Nakagawa, O
    Arnold, M
    Nakagawa, M
    Hamada, H
    Shelton, JM
    Kusano, H
    Harris, TM
    Childs, G
    Campbell, KP
    Richardson, JA
    Nishino, I
    Olson, EN
    [J]. GENES & DEVELOPMENT, 2005, 19 (17) : 2066 - 2077
  • [89] Characterization of Human Dosage-Sensitive Transcription Factor Genes
    Ni, Zhihua
    Zhou, Xiao-Yu
    Aslam, Sidra
    Niu, Deng-Ke
    [J]. FRONTIERS IN GENETICS, 2019, 10
  • [90] Monocyte-derived IL-1 and IL-6 are differentially required for cytokine-release syndrome and neurotoxicity due to CAR T cells
    Norelli, Margherita
    Camisa, Barbara
    Barbiera, Giulia
    Falcone, Laura
    Purevdorj, Ayurzana
    Genua, Marco
    Sanvito, Francesca
    Ponzoni, Maurilio
    Doglioni, Claudio
    Cristofori, Patrizia
    Traversari, Catia
    Bordignon, Claudio
    Ciceri, Fabio
    Ostuni, Renato
    Bonini, Chiara
    Casucci, Monica
    Bondanza, Attilio
    [J]. NATURE MEDICINE, 2018, 24 (06) : 739 - +