A foundation model of transcription across human cell types

被引:1
|
作者
Fu, Xi [1 ,2 ]
Mo, Shentong [3 ,4 ]
Buendia, Alejandro [1 ]
Laurent, Anouchka P. [5 ]
Shao, Anqi [6 ]
Alvarez-Torres, Maria del Mar [1 ]
Yu, Tianji [1 ]
Tan, Jimin [7 ]
Su, Jiayu [1 ]
Sagatelian, Romella [1 ]
Ferrando, Adolfo A. [5 ,8 ]
Ciccia, Alberto [9 ]
Lan, Yanyan [10 ,11 ]
Owens, David M. [6 ,12 ]
Palomero, Teresa [5 ,12 ]
Xing, Eric P. [3 ,4 ]
Rabadan, Raul [1 ,2 ]
机构
[1] Columbia Univ, Dept Syst Biol, Program Math Genom, New York, NY 10027 USA
[2] Columbia Univ, Dept Biomed Informat, New York, NY 10027 USA
[3] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi, U Arab Emirates
[4] Carnegie Mellon Univ, Dept Machine Learning, Pittsburgh, PA 15213 USA
[5] Columbia Univ, Inst Canc Genet, New York, NY USA
[6] Columbia Univ, Dept Dermatol, New York, NY USA
[7] NYU, Inst Syst Genet, Grossman Sch Med, New York, NY USA
[8] Regeneron, Regeneron Genet Ctr, Tarrytown, NY USA
[9] Columbia Univ, Dept Genet & Dev, New York, NY USA
[10] Tsinghua Univ, Inst AI Ind Res, Beijing, Peoples R China
[11] Tsinghua Univ, Beijing Frontier Res Ctr Biol Struct, Beijing, Peoples R China
[12] Columbia Univ, Dept Pathol & Cell Biol, New York, NY USA
关键词
GENE-EXPRESSION; TARGET GENES; PAX5; METHYLATION; CHROMATIN; DNA;
D O I
10.1038/s41586-024-08391-z
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Transcriptional regulation, which involves a complex interplay between regulatory sequences and proteins, directs all biological processes. Computational models of transcription lack generalizability to accurately extrapolate to unseen cell types and conditions. Here we introduce GET (general expression transformer), an interpretable foundation model designed to uncover regulatory grammars across 213 human fetal and adult cell types1,2. Relying exclusively on chromatin accessibility data and sequence information, GET achieves experimental-level accuracy in predicting gene expression even in previously unseen cell types3. GET also shows remarkable adaptability across new sequencing platforms and assays, enabling regulatory inference across a broad range of cell types and conditions, and uncovers universal and cell-type-specific transcription factor interaction networks. We evaluated its performance in prediction of regulatory activity, inference of regulatory elements and regulators, and identification of physical interactions between transcription factors and found that it outperforms current models4 in predicting lentivirus-based massively parallel reporter assay readout5,6. In fetal erythroblasts7, we identified distal (greater than 1 Mbp) regulatory regions that were missed by previous models, and, in B cells, we identified a lymphocyte-specific transcription factor-transcription factor interaction that explains the functional significance of a leukaemia risk predisposing germline mutation8, 9-10. In sum, we provide a generalizable and accurate model for transcription together with catalogues of gene regulation and transcription factor interactions, all with cell type specificity.
引用
收藏
页码:965 / 973
页数:28
相关论文
共 50 条
  • [11] scAPAatlas: an atlas of alternative polyadenylation across cell types in human and mouse
    Yang, Xiaoxiao
    Tong, Yang
    Liu, Gerui
    Yuan, Jiapei
    Yang, Yang
    NUCLEIC ACIDS RESEARCH, 2022, 50 (D1) : D356 - D364
  • [12] Initiation of mtDNA transcription is followed by pausing, and diverges across human cell types and during evolution (vol 27, pg 362, 2017)
    Blumberg, Amit
    Rice, Edward J.
    Kundaje, Anshul
    Danko, Charles G.
    Mishmar, Dan
    GENOME RESEARCH, 2019, 29 (04) : 710 - 710
  • [13] Patterns of regulatory activity across diverse human cell types predict tissue identity, transcription factor binding, and long-range interactions
    Sheffield, Nathan C.
    Thurman, Robert E.
    Song, Lingyun
    Safi, Alexias
    Stamatoyannopoulos, John A.
    Lenhard, Boris
    Crawford, Gregory E.
    Furey, Terrence S.
    GENOME RESEARCH, 2013, 23 (05) : 777 - 788
  • [14] The lactoferrin receptor is differentially expressed across several human epithelial cell types
    Lopez, Sydney Alissa
    Nonnecke, Eric B.
    Loennerdal, Bo L.
    FASEB JOURNAL, 2012, 26
  • [15] MERAV: a tool for comparing gene expression across human tissues and cell types
    Shaul, Yoav D.
    Yuan, Bingbing
    Thiru, Prathapan
    Nutter-Upham, Andy
    McCallum, Scott
    Lanzkron, Carolyn
    Bell, George W.
    Sabatini, David M.
    NUCLEIC ACIDS RESEARCH, 2016, 44 (D1) : D560 - D566
  • [16] Cell Types Resolved by scRNA-seq, Mapped across Human Tissues
    Regev, Aviv
    Hupalowska, Anna
    Genetic Engineering and Biotechnology News, 2022, 42 (06):
  • [17] Quantifying the similarity of topological domains across normal and cancer human cell types
    Sauerwald, Natalie
    Kingsford, Carl
    BIOINFORMATICS, 2018, 34 (13) : 475 - 483
  • [18] 3D polymer simulations of genome organisation and transcription across different chromosomes and cell types
    Semeraro, Massimiliano
    Negro, Giuseppe
    Suma, Antonio
    Gonnella, Giuseppe
    Marenduzzo, Davide
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2023, 625
  • [19] A cell atlas foundation model for scalable search of similar human cells
    Heimberg, Graham
    Kuo, Tony
    Depianto, Daryle J.
    Salem, Omar
    Heigl, Tobias
    Diamant, Nathaniel
    Scalia, Gabriele
    Biancalani, Tommaso
    Turley, Shannon J.
    Rock, Jason R.
    Bravo, Hector Corrada
    Kaminker, Josh
    Vander Heiden, Jason A.
    Regev, Aviv
    NATURE, 2025, 638 (8052) : 1085 - 1094
  • [20] Abstract Behavior Types: a foundation model for components and their composition
    Arbab, F
    SCIENCE OF COMPUTER PROGRAMMING, 2005, 55 (1-3) : 3 - 52