A fast, scalable and versatile tool for analysis of single-cell omics data

被引:30
作者
Zhang, Kai [1 ,6 ]
Zemke, Nathan R. [1 ,2 ]
Armand, Ethan J. [1 ,3 ]
Ren, Bing [1 ,2 ,4 ,5 ]
机构
[1] Univ Calif San Diego, Sch Med, Dept Cellular & Mol Med, La Jolla, CA 92093 USA
[2] Univ Calif San Diego, Ctr Epigen, Sch Med, La Jolla, CA 92093 USA
[3] Univ Calif San Diego, Bioinformat & Syst Biol Program, La Jolla, CA USA
[4] Ludwig Inst Canc Res, La Jolla, CA 92093 USA
[5] Univ Calif San Diego, Inst Genom Med, La Jolla, CA 92093 USA
[6] Westlake Univ, Sch Life Sci, Westlake Lab Life Sci & Biomed, Hangzhou, Peoples R China
基金
美国国家卫生研究院;
关键词
DIFFUSION MAPS; CHROMATIN;
D O I
10.1038/s41592-023-02139-9
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Single-cell omics technologies have revolutionized the study of gene regulation in complex tissues. A major computational challenge in analyzing these datasets is to project the large-scale and high-dimensional data into low-dimensional space while retaining the relative relationships between cells. This low dimension embedding is necessary to decompose cellular heterogeneity and reconstruct cell-type-specific gene regulatory programs. Traditional dimensionality reduction techniques, however, face challenges in computational efficiency and in comprehensively addressing cellular diversity across varied molecular modalities. Here we introduce a nonlinear dimensionality reduction algorithm, embodied in the Python package SnapATAC2, which not only achieves a more precise capture of single-cell omics data heterogeneities but also ensures efficient runtime and memory usage, scaling linearly with the number of cells. Our algorithm demonstrates exceptional performance, scalability and versatility across diverse single-cell omics datasets, including single-cell assay for transposase-accessible chromatin using sequencing, single-cell RNA sequencing, single-cell Hi-C and single-cell multi-omics datasets, underscoring its utility in advancing single-cell analysis. SnapATAC2 uses a matrix-free spectral embedding algorithm for nonlinear dimension reduction of single-cell omics data, which shows an improved performance in capturing cellular heterogeneity and scalability for large datasets.
引用
收藏
页码:217 / 227
页数:30
相关论文
共 61 条
[1]   destiny: diffusion maps for large-scale single cell data in R [J].
Angerer, Philipp ;
Haghverdi, Laleh ;
Buettner, Maren ;
Theis, Fabian J. ;
Marr, Carsten ;
Buettner, Florian .
BIOINFORMATICS, 2016, 32 (08) :1241-1243
[2]  
[Anonymous], 2011, The Python Language Reference Manual
[3]   MOFA plus : a statistical framework for comprehensive integration of multi-modal single-cell data [J].
Argelaguet, Ricard ;
Arnol, Damien ;
Bredikhin, Danila ;
Deloro, Yonatan ;
Velten, Britta ;
Marioni, John C. ;
Stegle, Oliver .
GENOME BIOLOGY, 2020, 21 (01)
[4]  
ASHUACH T, 2022, CELL REP METHODS, V2, DOI DOI 10.1016/J.CRMETH.2022.100182
[5]   Single-cell ATAC sequencing analysis: From data preprocessing to hypothesis generation [J].
Baek, Seungbyn ;
Lee, Insuk .
COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2020, 18 :1429-1439
[6]   Laplacian eigenmaps for dimensionality reduction and data representation [J].
Belkin, M ;
Niyogi, P .
NEURAL COMPUTATION, 2003, 15 (06) :1373-1396
[7]   SCENIC plus : single-cell multiomic inference of enhancers and gene regulatory networks [J].
Gonzalez-Blas, Carmen Bravo ;
De Winter, Seppe ;
Hulselmans, Gert ;
Hecker, Nikolai ;
Matetovici, Irina ;
Christiaens, Valerie ;
Poovathingal, Suresh ;
Wouters, Jasper ;
Aibar, Sara ;
Aerts, Stein .
NATURE METHODS, 2023, 20 (09) :1355-+
[8]   MUON: multimodal omics analysis framework [J].
Bredikhin, Danila ;
Kats, Ilia ;
Stegle, Oliver .
GENOME BIOLOGY, 2022, 23 (01)
[9]   Integrated Single-Cell Analysis Maps the Continuous Regulatory Landscape of Human Hematopoietic Differentiation [J].
Buenrostro, Jason D. ;
Corces, M. Ryan ;
Lareau, Caleb A. ;
Wu, Beijing ;
Schep, Alicia N. ;
Aryee, Martin J. ;
Majeti, Ravindra ;
Chang, Howard Y. ;
Greenleaf, William J. .
CELL, 2018, 173 (06) :1535-+
[10]   Multi-omics single-cell data integration and regulatory inference with graph-linked embedding [J].
Cao, Zhi-Jie ;
Gao, Ge .
NATURE BIOTECHNOLOGY, 2022, 40 (10) :1458-+