Generative pretraining from large-scale transcriptomes for single-cell deciphering

被引:15
作者
Shen, Hongru [1 ]
Liu, Jilei [1 ]
Hu, Jiani [1 ]
Shen, Xilin [1 ]
Zhang, Chao [2 ]
Wu, Dan [1 ]
Feng, Mengyao [1 ]
Yang, Meng [1 ]
Li, Yang [1 ]
Yang, Yichen [1 ]
Wang, Wei [3 ]
Zhang, Qiang [4 ]
Yang, Jilong [2 ]
Chen, Kexin [3 ]
Li, Xiangchun [1 ]
机构
[1] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Tianjin Canc Inst, Tianjins Clin Res Ctr Canc,Natl Clin Res Ctr Canc, Tianjin, Peoples R China
[2] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Dept Bone & Soft Tissue Tumor, Tianjins Clin Res Ctr Canc,Natl Clin Res Ctr Canc, Tianjin, Peoples R China
[3] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Dept Epidemiol & Biostat, Natl Clin Res Ctr Canc,Key Lab Mol Canc Epidemiol, Tianjin, Peoples R China
[4] Tianjin Med Univ, Tianjin Med Univ Canc Inst & Hosp, Tianjins Clin Res Ctr Canc, Dept Maxillofacial & Otorhinolaryngol Oncol,Natl C, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
EXPRESSION; TISSUES;
D O I
10.1016/j.isci.2023.106536
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Exponential accumulation of single-cell transcriptomes poses great challenge for efficient assimilation. Here, we present an approach entitled generative pretrain-ing from transcriptomes (tGPT) for learning feature representation of transcrip-tomes. tGPT is conceptually simple in that it autoregressive models the ranking of a gene in the context of its preceding neighbors. We developed tGPT with 22.3 million single-cell transcriptomes and used four single-cell datasets to eval-utate its performance on single-cell analysis tasks. In addition, we examine its ap-plications on bulk tissues. The single-cell clusters and cell lineage trajectories derived from tGPT are highly aligned with known cell labels and states. The feature patterns of tumor bulk tissues learned by tGPT are associated with a wide range of genomic alteration events, prognosis, and treatment outcome of immunotherapy. tGPT represents a new analytical paradigm for integrating and deciphering massive amounts of transcriptome data and it will facilitate the inter-pretation and clinical translation of single-cell transcriptomes.
引用
收藏
页数:20
相关论文
共 58 条
  • [1] Exploring single-cell data with deep multitasking neural networks
    Amodio, Matthew
    van Dijk, David
    Srinivasan, Krishnan
    Chen, William S.
    Mohsen, Hussein
    Moon, Kevin R.
    Campbell, Allison
    Zhao, Yujiao
    Wang, Xiaomei
    Venkataswamy, Manjunatha
    Desai, Anita
    Ravi, V.
    Kumar, Priti
    Montgomery, Ruth
    Wolf, Guy
    Krishnaswamy, Smita
    [J]. NATURE METHODS, 2019, 16 (11) : 1139 - +
  • [2] Brown TB, 2020, Arxiv, DOI arXiv:2005.14165
  • [3] Bao Hangbo, 2021, arXiv, DOI [10.48550/arXiv.2106.08254, DOI 10.48550/ARXIV.2106.08254]
  • [4] Dimensionality reduction for visualizing single-cell data using UMAP
    Becht, Etienne
    McInnes, Leland
    Healy, John
    Dutertre, Charles-Antoine
    Kwok, Immanuel W. H.
    Ng, Lai Guan
    Ginhoux, Florent
    Newell, Evan W.
    [J]. NATURE BIOTECHNOLOGY, 2019, 37 (01) : 38 - +
  • [5] Bommasani R., 2021, PREPRINT, DOI [DOI 10.48550/ARXIV.2108.07258, 10.48550/arXiv.2108.07258]
  • [6] A test metric for assessing single-cell RNA-seq batch correction
    Buettner, Maren
    Miao, Zhichao
    Wolf, F. Alexander
    Teichmann, Sarah A.
    Theis, Fabian J.
    [J]. NATURE METHODS, 2019, 16 (01) : 43 - +
  • [7] Integrating single-cell transcriptomic data across different conditions, technologies, and species
    Butler, Andrew
    Hoffman, Paul
    Smibert, Peter
    Papalexi, Efthymia
    Satija, Rahul
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (05) : 411 - +
  • [8] Chen M, 2020, PR MACH LEARN RES, V119
  • [9] Cheng JP, 2016, Arxiv, DOI arXiv:1601.06733
  • [10] Devlin J, 2019, Arxiv, DOI arXiv:1810.04805