SAILER: scalable and accurate invariant representation learning for single-cell ATAC-seq processing and integration

被引:10
作者
Cao, Yingxin [1 ,2 ,3 ]
Fu, Laiyi [1 ,4 ]
Wu, Jie [5 ]
Peng, Qinke [4 ]
Nie, Qing [2 ,3 ,6 ]
Zhang, Jing [1 ]
Xie, Xiaohui [1 ]
机构
[1] Univ Calif Irvine, Dept Comp Sci, Irvine, CA 92697 USA
[2] Univ Calif Irvine, Ctr Complex Biol Syst, Irvine, CA 92697 USA
[3] Univ Calif Irvine, NSF Simons Ctr Multiscale Cell Fate Res, Irvine, CA 92697 USA
[4] Xi An Jiao Tong Univ, Sch Elect & Informat Engn, Syst Engn Inst, Xian 710049, Shaanxi, Peoples R China
[5] Univ Calif Irvine, Dept Biol Chem, Irvine, CA 92697 USA
[6] Univ Calif Irvine, Dept Math, Irvine, CA 92697 USA
关键词
CHROMATIN ACCESSIBILITY;
D O I
10.1093/bioinformatics/btab303
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single-cell sequencing assay for transposase-accessible chromatin (scATAC-seq) provides new opportunities to dissect epigenomic heterogeneity and elucidate transcriptional regulatory mechanisms. However, computational modeling of scATAC-seq data is challenging due to its high dimension, extreme sparsity, complex dependencies and high sensitivity to confounding factors from various sources. Results: Here, we propose a new deep generative model framework, named SAILER, for analyzing scATAC-seq data. SAILER aims to learn a low-dimensional nonlinear latent representation of each cell that defines its intrinsic chromatin state, invariant to extrinsic confounding factors like read depth and batch effects. SAILER adopts the conventional encoder-decoder framework to learn the latent representation but imposes additional constraints to ensure the independence of the learned representations from the confounding factors. Experimental results on both simulated and real scATAC-seq datasets demonstrate that SAILER learns better and biologically more meaningful representations of cells than other methods. Its noise-free cell embeddings bring in significant benefits in downstream analyses: clustering and imputation based on SAILER result in 6.9% and 18.5% improvements over existing methods, respectively. Moreover, because no matrix factorization is involved, SAILER can easily scale to process millions of cells. We implemented SAILER into a software package, freely available to all for large-scale scATAC-seq data analysis.
引用
收藏
页码:I317 / I326
页数:10
相关论文
共 34 条
  • [1] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [2] High-resolution mapping and characterization of open chromatin across the genome
    Boyle, Alan P.
    Davis, Sean
    Shulha, Hennady P.
    Meltzer, Paul
    Margulies, Elliott H.
    Weng, Zhiping
    Furey, Terrence S.
    Crawford, Gregory E.
    [J]. CELL, 2008, 132 (02) : 311 - 322
  • [3] Single-cell chromatin accessibility reveals principles of regulatory variation
    Buenostro, Jason D.
    Wu, Beijing
    Litzenburger, Ulrike M.
    Ruff, Dave
    Gonzales, Michael L.
    Snyder, Michael P.
    Chang, Howard Y.
    Greenleaf, William J.
    [J]. NATURE, 2015, 523 (7561) : 486 - U264
  • [4] Buenrostro Jason D, 2015, Curr Protoc Mol Biol, V109, DOI 10.1002/0471142727.mb2129s109
  • [5] Assessment of computational methods for the analysis of single-cell ATAC-seq data
    Chen, Huidong
    Lareau, Caleb A.
    Andreani, Tommaso
    Vinyard, Michael E.
    Garcia, Sara P.
    Clement, Kendell
    Andrade-Navarro, Miguel
    Buenrostro, Jason D.
    Pinello, Luca
    [J]. GENOME BIOLOGY, 2019, 20 (01)
  • [6] A rapid and robust method for single cell chromatin accessibility profiling
    Chen, Xi
    Miragaia, Ricardo J.
    Natarajan, Kedar Nath
    Teichmann, Sarah A.
    [J]. NATURE COMMUNICATIONS, 2018, 9
  • [7] SCAN-ATAC-Sim: a scalable and efficient method for simulating single-cell ATAC-seq data from bulk-tissue experiments
    Chen, Zhanlin
    Zhang, Jing
    Liu, Jason
    Zhang, Zixuan
    Zhu, Jiangqi
    Lee, Donghoon
    Xu, Min
    Gerstein, Mark
    [J]. BIOINFORMATICS, 2021, 37 (12) : 1756 - 1758
  • [8] A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility
    Cusanovich, Darren A.
    Hill, Andrew J.
    Aghamirzaie, Delasa
    Daza, Riza M.
    Pliner, Hannah A.
    Berletch, Joel B.
    Filippova, Galina N.
    Huang, Xingfan
    Christiansen, Lena
    DeWitt, William S.
    Lee, Choli
    Regalado, Samuel G.
    Read, David F.
    Steemers, Frank J.
    Disteche, Christine M.
    Trapnell, Cole
    Shendure, Jay
    [J]. CELL, 2018, 174 (05) : 1309 - +
  • [9] Multiplex single-cell profiling of chromatin accessibility by combinatorial cellular indexing
    Cusanovich, Darren A.
    Daza, Riza
    Adey, Andrew
    Pliner, Hannah A.
    Christiansen, Lena
    Gunderson, Kevin L.
    Steemers, Frank J.
    Trapnell, Cole
    Shendure, Jay
    [J]. SCIENCE, 2015, 348 (6237) : 910 - 914
  • [10] Comprehensive analysis of single cell ATAC-seq data with SnapATAC
    Fang, Rongxin
    Preissl, Sebastian
    Li, Yang
    Hou, Xiaomeng
    Lucero, Jacinta
    Wang, Xinxin
    Motamedi, Amir
    Shiau, Andrew K.
    Zhou, Xinzhu
    Xie, Fangming
    Mukamel, Eran A.
    Zhang, Kai
    Zhang, Yanxiao
    Behrens, M. Margarita
    Ecker, Joseph R.
    Ren, Bing
    [J]. NATURE COMMUNICATIONS, 2021, 12 (01)