Transfer learning enables predictions in network biology

被引:226
作者
Theodoris, Christina V. [1 ,2 ,3 ,4 ]
Xiao, Ling [2 ,5 ]
Chopra, Anant [6 ]
Chaffin, Mark D. [2 ]
Al Sayed, Zeina R. [2 ]
Hill, Matthew C. [2 ,5 ]
Mantineo, Helene [2 ,5 ]
Brydon, Elizabeth M. [6 ]
Zeng, Zexian [1 ,7 ]
Liu, X. Shirley [1 ,7 ,8 ]
Ellinor, Patrick T. [2 ,5 ]
机构
[1] Dana Farber Canc Inst, Dept Data Sci, Boston, MA 02215 USA
[2] Broad Inst MIT & Harvard, Cardiovasc Dis Initiat & Precis Cardiol Lab, Cambridge, MA 02142 USA
[3] Boston Childrens Hosp, Div Genet & Genom, Boston, MA 02115 USA
[4] Harvard Med Sch, Genet Training Program, Boston, MA 02115 USA
[5] Massachusetts Gen Hosp, Cardiovasc Res Ctr, Boston, MA 02114 USA
[6] Bayer US LLC, Precis Cardiol Lab, Cambridge, MA USA
[7] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
[8] Dana Farber Canc Inst, Ctr Funct Canc Epigenet, Boston, MA USA
基金
美国国家卫生研究院;
关键词
SINGLE-CELL TRANSCRIPTOME; IN-VITRO; DIFFERENTIATION; MUTATIONS; GENES; HETEROGENEITY; TRAJECTORIES; LANDSCAPE; ORGANOIDS; SUBSETS;
D O I
10.1038/s41586-023-06139-9
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Mapping gene networks requires large amounts of transcriptomic data to learn the connections between genes, which impedes discoveries in settings with limited data, including rare diseases and diseases affecting clinically inaccessible tissues. Recently, transfer learning has revolutionized fields such as natural language understanding1,2 and computer vision3 by leveraging deep learning models pretrained on large-scale general datasets that can then be fine-tuned towards a vast array of downstream tasks with limited task-specific data. Here, we developed a context-aware, attention-based deep learning model, Geneformer, pretrained on a large-scale corpus of about 30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. Fine-tuning towards a diverse panel of downstream tasks relevant to chromatin and network dynamics using limited task-specific data demonstrated that Geneformer consistently boosted predictive accuracy. Applied to disease modelling with limited patient data, Geneformer identified candidate therapeutic targets for cardiomyopathy. Overall, Geneformer represents a pretrained deep learning model from which fine-tuning towards a broad range of downstream applications can be pursued to accelerate discovery of key network regulators and candidate therapeutic targets.
引用
收藏
页码:616 / 624
页数:32
相关论文
共 159 条
  • [41] Dissecting human embryonic skeletal stem cell ontogeny by single-cell transcriptomic and functional analyses
    He, Jian
    Yan, Jing
    Wang, Jianfang
    Zhao, Liangyu
    Xin, Qian
    Zeng, Yang
    Sun, Yuxi
    Zhang, Han
    Bai, Zhijie
    Li, Zongcheng
    Ni, Yanli
    Gong, Yandong
    Li, Yunqiao
    He, Han
    Bian, Zhilei
    Lan, Yu
    Ma, Chunyu
    Bian, Lihong
    Zhu, Heng
    Liu, Bing
    Yue, Rui
    [J]. CELL RESEARCH, 2021, 31 (07) : 742 - 757
  • [42] He K., 2016, P IEEE C COMPUTER VI, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
  • [43] Single-cell transcriptome profiling of an adult human cell atlas of 15 major organs
    He, Shuai
    Wang, Lin-He
    Liu, Yang
    Li, Yi-Qi
    Chen, Hai-Tian
    Xu, Jing-Hong
    Peng, Wan
    Lin, Guo-Wang
    Wei, Pan-Pan
    Li, Bo
    Xia, Xiaojun
    Wang, Dan
    Bei, Jin-Xin
    He, Xiaoshun
    Guo, Zhiyong
    [J]. GENOME BIOLOGY, 2020, 21 (01)
  • [44] Henighan T., 2020, COMPUTING RES REPOSI, DOI [10.48550/arXiv.2010.14701, DOI 10.48550/ARXIV.2010.14701]
  • [45] The Mammalian Spermatogenesis Single-Cell Transcriptome, from Spermatogonial Stem Cells to Spermatids
    Hermann, Brian P.
    Cheng, Keren
    Singh, Anukriti
    Roa-De la Cruz, Lorena
    Mutoji, Kazadi N.
    Chen, I-Chung
    Gildersleeve, Heidi
    Lehle, Jake D.
    Mayo, Max
    Westernstroer, Birgit
    Law, Nathan C.
    Oatley, Melissa J.
    Velte, Ellen K.
    Niedenberger, Bryan A.
    Fritze, Danielle
    Silber, Sherman
    Geyer, Christopher B.
    Oatley, Jon M.
    McCarrey, John R.
    [J]. CELL REPORTS, 2018, 25 (06): : 1650 - +
  • [46] Titin mutations in iPS cells define sarcomere insufficiency as a cause of dilated cardiomyopathy
    Hinson, John T.
    Chopra, Anant
    Nafissi, Navid
    Polacheck, William J.
    Benson, Craig C.
    Swist, Sandra
    Gorham, Joshua
    Yang, Luhan
    Schafer, Sebastian
    Sheng, Calvin C.
    Haghighi, Alireza
    Homsy, Jason
    Hubner, Norbert
    Church, George
    Cook, Stuart A.
    Linke, Wolfgang A.
    Chen, Christopher S.
    Seidman, J. G.
    Seidman, Christine E.
    [J]. SCIENCE, 2015, 349 (6251) : 982 - 986
  • [47] Modulation of β-adrenergic receptor signaling in heart failure and longevity: targeting adenylyl cyclase type 5
    Ho, David
    Yan, Lin
    Iwatsubo, Kousaku
    Vatner, Dorothy E.
    Vatner, Stephen F.
    [J]. HEART FAILURE REVIEWS, 2010, 15 (05) : 495 - 512
  • [48] Single-cell transcriptomics reveals gene expression dynamics of human fetal kidney development
    Hochane, Mazene
    van den Berg, Patrick R.
    Fan, Xueying
    Berenger-Curries, Noemie
    Adegeest, Esmee
    Bialecka, Monika
    Nieveen, Maaike
    Menschaart, Maarten
    Lopes, Susana M. Chuva de Sousa
    Semrau, Stefan
    [J]. PLOS BIOLOGY, 2019, 17 (02)
  • [49] Single-cell analysis of bone marrow-derived CD34+ cells from children with sickle cell disease and thalassemia
    Hua, Peng
    Roy, Noemi
    de la Fuente, Josu
    Wang, Guanlin
    Thongjuea, Supat
    Clark, Kevin
    Roy, Anindita
    Psaila, Bethan
    Ashley, Neil
    Harrington, Yvonne
    Nerlov, Claus
    Watt, Suzanne M.
    Roberts, Irene
    Davies, James O. J.
    [J]. BLOOD, 2019, 134 (23) : 2111 - 2115
  • [50] Reconstructed Single-Cell Fate Trajectories Define Lineage Plasticity Windows during Differentiation of Human PSC-Derived Distal Lung Progenitors
    Hurley, Killian
    Ding, Jun
    Villacorta-Martin, Carlos
    Herriges, Michael J.
    Jacob, Anjali
    Vedaie, Marall
    Alysandratos, Konstantinos D.
    Sun, Yuliang L.
    Lin, Chieh
    Werder, Rhiannon B.
    Huang, Jessie
    Wilson, Andrew A.
    Mithal, Aditya
    Mostoslavsky, Gustavo
    Oglesby, Irene
    Caballero, Ignacio S.
    Guttentag, Susan H.
    Ahangari, Farida
    Kaminski, Naftali
    Rodriguez-Fraticelli, Alejo
    Camargo, Fernando
    Bar-Joseph, Ziv
    Kotton, Darrell N.
    [J]. CELL STEM CELL, 2020, 26 (04) : 593 - +