Scalable and unbiased sequence-informed embedding of single-cell ATAC-seq data with CellSpace

被引:2
|
作者
Tayyebi, Zakieh [1 ,2 ]
Pine, Allison R. [1 ,2 ]
Leslie, Christina S. [1 ]
机构
[1] Mem Sloan Kettering Canc Ctr, Computat & Syst Biol Program, New York, NY 10065 USA
[2] Triinst Training Program Computat Biol & Med, New York, NY USA
关键词
D O I
10.1038/s41592-024-02274-x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Standard scATAC sequencing (scATAC-seq) analysis pipelines represent cells as sparse numeric vectors relative to an atlas of peaks or genomic tiles and consequently ignore genomic sequence information at accessible loci. Here we present CellSpace, an efficient and scalable sequence-informed embedding algorithm for scATAC-seq that learns a mapping of DNA k-mers and cells to the same space, to address this limitation. We show that CellSpace captures meaningful latent structure in scATAC-seq datasets, including cell subpopulations and developmental hierarchies, and can score transcription factor activities in single cells based on proximity to binding motifs embedded in the same space. Importantly, CellSpace implicitly mitigates batch effects arising from multiple samples, donors or assays, even when individual datasets are processed relative to different peak atlases. Thus, CellSpace provides a powerful tool for integrating and interpreting large-scale scATAC-seq compendia. By learning to embed DNA k-mers and cells into a joint space, CellSpace improves single-cell ATAC-seq analysis in multiple tasks such as latent structure discovery, transcription factor activity inference and batch effect mitigation.
引用
收藏
页码:1014 / 1022
页数:21
相关论文
共 50 条
  • [1] Network diffusion for scalable embedding of massive single-cell ATAC-seq data
    Dong, Kangning
    Zhang, Shihua
    SCIENCE BULLETIN, 2021, 66 (22) : 2271 - 2276
  • [2] Single-cell ATAC-seq: strength in numbers
    Pott, Sebastian
    Lieb, Jason D.
    GENOME BIOLOGY, 2015, 16
  • [3] Single-cell ATAC-seq: strength in numbers
    Sebastian Pott
    Jason D. Lieb
    Genome Biology, 16
  • [4] Assessment of computational methods for the analysis of single-cell ATAC-seq data
    Chen, Huidong
    Lareau, Caleb A.
    Andreani, Tommaso
    Vinyard, Michael E.
    Garcia, Sara P.
    Clement, Kendell
    Andrade-Navarro, Miguel
    Buenrostro, Jason D.
    Pinello, Luca
    GENOME BIOLOGY, 2019, 20 (01)
  • [5] Modeling Single-Cell ATAC-Seq Data Based on Contrastive Learning
    Lan, Wei
    Zhou, Weihao
    Chen, Qingfeng
    Zheng, Ruiqing
    Pan, Yi
    Chen, Yi-Ping Phoebe
    BIOINFORMATICS RESEARCH AND APPLICATIONS, PT I, ISBRA 2024, 2024, 14954 : 473 - 482
  • [6] Assessment of computational methods for the analysis of single-cell ATAC-seq data
    Huidong Chen
    Caleb Lareau
    Tommaso Andreani
    Michael E. Vinyard
    Sara P. Garcia
    Kendell Clement
    Miguel A. Andrade-Navarro
    Jason D. Buenrostro
    Luca Pinello
    Genome Biology, 20
  • [7] Decoding cell replicational age from single-cell ATAC-seq data
    Xiao, Yu
    Zhang, Yi
    NATURE BIOTECHNOLOGY, 2024,
  • [8] simATAC: a single-cell ATAC-seq simulation framework
    Zeinab Navidi
    Lin Zhang
    Bo Wang
    Genome Biology, 22
  • [9] simATAC: a single-cell ATAC-seq simulation framework
    Navidi, Zeinab
    Zhang, Lin
    Wang, Bo
    GENOME BIOLOGY, 2021, 22 (01)
  • [10] Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
    Wang, Xi
    Lian, Qiwei
    Dong, Haoyu
    Xu, Shuo
    Su, Yaru
    Wu, Xiaohui
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2024, 22 (02)