A foundation model of transcription across human cell types

被引:1
|
作者
Fu, Xi [1 ,2 ]
Mo, Shentong [3 ,4 ]
Buendia, Alejandro [1 ]
Laurent, Anouchka P. [5 ]
Shao, Anqi [6 ]
Alvarez-Torres, Maria del Mar [1 ]
Yu, Tianji [1 ]
Tan, Jimin [7 ]
Su, Jiayu [1 ]
Sagatelian, Romella [1 ]
Ferrando, Adolfo A. [5 ,8 ]
Ciccia, Alberto [9 ]
Lan, Yanyan [10 ,11 ]
Owens, David M. [6 ,12 ]
Palomero, Teresa [5 ,12 ]
Xing, Eric P. [3 ,4 ]
Rabadan, Raul [1 ,2 ]
机构
[1] Columbia Univ, Dept Syst Biol, Program Math Genom, New York, NY 10027 USA
[2] Columbia Univ, Dept Biomed Informat, New York, NY 10027 USA
[3] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[4] Carnegie Mellon Univ, Dept Machine Learning, Pittsburgh, PA 15213 USA
[5] Columbia Univ, Inst Canc Genet, New York, NY USA
[6] Columbia Univ, Dept Dermatol, New York, NY USA
[7] NYU, Inst Syst Genet, Grossman Sch Med, New York, NY USA
[8] Regeneron, Regeneron Genet Ctr, Tarrytown, NY USA
[9] Columbia Univ, Dept Genet & Dev, New York, NY USA
[10] Tsinghua Univ, Inst AI Ind Res, Beijing, Peoples R China
[11] Tsinghua Univ, Beijing Frontier Res Ctr Biol Struct, Beijing, Peoples R China
[12] Columbia Univ, Dept Pathol & Cell Biol, New York, NY USA
关键词
GENE-EXPRESSION; TARGET GENES; PAX5; METHYLATION; CHROMATIN; DNA;
D O I
10.1038/s41586-024-08391-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types1,2. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types3. GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovers universal and cell-type-specific transcription factor interaction networks. We evaluated its performance in prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors and found that it outperforms current models4 in predicting lentivirus-based massively parallel reporter assay readout5,6. In fetal erythroblasts7, we identified distal (greater than 1 Mbp) regulatory regions that were missed by previous models, and, in B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukaemia risk predisposing germline mutation8, 9-10. In sum, we provide a generalizable and accurate model for transcription together with catalogues of gene regulation and transcription factor interactions, all with cell type specificity.
引用
收藏
页码:965 / 973
页数:28
相关论文
共 50 条
  • [1] Initiation of mtDNA transcription is followed by pausing, and diverges across human cell types and during evolution
    Blumberg, Amit
    Rice, Edward J.
    Kundaje, Anshul
    Danko, Charles G.
    Mishmar, Dan
    GENOME RESEARCH, 2017, 27 (03) : 362 - 373
  • [2] Mapping cell types across human tissues
    Liu, Zedao
    Zhang, Zemin
    SCIENCE, 2022, 376 (6594) : 695 - 696
  • [3] Profiling the transcription factor regulatory networks of human cell types
    Zhang, Shihua
    Tian, Dechao
    Ngoc Hieu Tran
    Choi, Kwok Pui
    Zhang, Louxin
    NUCLEIC ACIDS RESEARCH, 2014, 42 (20) : 12380 - 12387
  • [4] An atlas of active enhancers across human cell types and tissues
    Robin Andersson
    Claudia Gebhard
    Irene Miguel-Escalada
    Ilka Hoof
    Jette Bornholdt
    Mette Boyd
    Yun Chen
    Xiaobei Zhao
    Christian Schmidl
    Takahiro Suzuki
    Evgenia Ntini
    Erik Arner
    Eivind Valen
    Kang Li
    Lucia Schwarzfischer
    Dagmar Glatz
    Johanna Raithel
    Berit Lilje
    Nicolas Rapin
    Frederik Otzen Bagger
    Mette Jørgensen
    Peter Refsing Andersen
    Nicolas Bertin
    Owen Rackham
    A. Maxwell Burroughs
    J. Kenneth Baillie
    Yuri Ishizu
    Yuri Shimizu
    Erina Furuhata
    Shiori Maeda
    Yutaka Negishi
    Christopher J. Mungall
    Terrence F. Meehan
    Timo Lassmann
    Masayoshi Itoh
    Hideya Kawaji
    Naoto Kondo
    Jun Kawai
    Andreas Lennartsson
    Carsten O. Daub
    Peter Heutink
    David A. Hume
    Torben Heick Jensen
    Harukazu Suzuki
    Yoshihide Hayashizaki
    Ferenc Müller
    Alistair R. R. Forrest
    Piero Carninci
    Michael Rehli
    Albin Sandelin
    Nature, 2014, 507 : 455 - 461
  • [5] Transcriptomic diversity of cell types across the adult human brain
    Siletti, Kimberly
    Hodge, Rebecca
    Albiach, Alejandro Mossi
    Lee, Ka Wai
    Ding, Song-Lin
    Hu, Lijuan
    Lonnerberg, Peter
    Bakken, Trygve
    Casper, Tamara
    Clark, Michael
    Dee, Nick
    Gloe, Jessica
    Hirschstein, Daniel
    Shapovalova, Nadiya V.
    Keene, C. Dirk
    Nyhus, Julie
    Tung, Herman
    Yanny, Anna Marie
    Arenas, Ernest
    Lein, Ed S.
    Linnarsson, Sten
    SCIENCE, 2023, 382 (6667) : 175 - +
  • [6] An atlas of active enhancers across human cell types and tissues
    Andersson, Robin
    Gebhard, Claudia
    Miguel-Escalada, Irene
    Hoof, Ilka
    Bornholdt, Jette
    Boyd, Mette
    Chen, Yun
    Zhao, Xiaobei
    Schmidl, Christian
    Suzuki, Takahiro
    Ntini, Evgenia
    Arner, Erik
    Valen, Eivind
    Li, Kang
    Schwarzfischer, Lucia
    Glatz, Dagmar
    Raithel, Johanna
    Lilje, Berit
    Rapin, Nicolas
    Bagger, Frederik Otzen
    Jorgensen, Mette
    Andersen, Peter Refsing
    Bertin, Nicolas
    Rackham, Owen
    Burroughs, A. Maxwell
    Baillie, J. Kenneth
    Ishizu, Yuri
    Shimizu, Yuri
    Furuhata, Erina
    Maeda, Shiori
    Negishi, Yutaka
    Mungall, Christopher J.
    Meehan, Terrence F.
    Lassmann, Timo
    Itoh, Masayoshi
    Kawaji, Hideya
    Kondo, Naoto
    Kawai, Jun
    Lennartsson, Andreas
    Daub, Carsten O.
    Heutink, Peter
    Hume, David A.
    Jensen, Torben Heick
    Suzuki, Harukazu
    Hayashizaki, Yoshihide
    Mueller, Ferenc
    Forrest, Alistair R. R.
    Carninci, Piero
    Rehli, Michael
    Sandelin, Albin
    NATURE, 2014, 507 (7493) : 455 - +
  • [7] Nuclear DNA Content Varies with Cell Size across Human Cell Types
    Gillooly, James F.
    Hein, Andrew
    Damiani, Rachel
    COLD SPRING HARBOR PERSPECTIVES IN BIOLOGY, 2015, 7 (07): : 1 - 27
  • [8] The Human Cell Atlas from a cell census to a unified foundation model
    Rood, Jennifer E.
    Wynne, Samantha
    Robson, Lucia
    Hupalowska, Anna
    Randell, John
    Teichmann, Sarah A.
    Regev, Aviv
    NATURE, 2025, 637 (8048) : 1065 - 1071
  • [9] Aneuploidy effects on human gene expression across three cell types
    Liu, Siyuan
    Akula, Nirmala
    Reardon, Paul K.
    Russ, Jill
    Torres, Erin
    Clasen, Liv S.
    Blumenthal, Jonathan
    Lalonde, Francois
    McMahon, Francis J.
    Szele, Francis
    Disteche, Christine M.
    Cader, M. Zameel
    Raznahan, Armin
    PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2023, 120 (21)
  • [10] Jointly characterizing epigenetic dynamics across multiple human cell types
    Zhang, Yu
    An, Lin
    Yue, Feng
    Hardison, Ross C.
    NUCLEIC ACIDS RESEARCH, 2016, 44 (14) : 6721 - 6731