Secuer: Ultrafast, scalable and accurate clustering of single-cell RNA-seq data

被引:5
|
作者
Wei, Nana [1 ]
Nie, Yating [1 ]
Liu, Lin [2 ]
Zheng, Xiaoqi [3 ,4 ]
Wu, Hua-Jun [5 ,6 ]
机构
[1] Shanghai Normal Univ, Dept Math, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, SJTU Yale Joint Ctr Biostat & Data Sci, CMA Shanghai, Inst Nat Sci,MOE LSC,Sch Math Sci, Shanghai, Peoples R China
[3] Shanghai Artificial Intelligence Lab, Shanghai, Peoples R China
[4] Shanghai Jiao Tong Univ, Ctr Single Cell Omics, Sch Publ Hlth, Sch Med, Shanghai, Peoples R China
[5] Peking Univ Hlth Sci Ctr, Ctr Precis Med Multiom Res, Sch Basic Med Sci, Beijing, Peoples R China
[6] Peking Univ Canc Hosp & Inst, Beijing, Peoples R China
基金
上海市自然科学基金; 中国国家自然科学基金; 国家重点研发计划;
关键词
HETEROGENEITY; TRANSCRIPTOMES; FATE;
D O I
10.1371/journal.pcbi.1010753
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Identifying cell clusters is a critical step for single-cell transcriptomics study. Despite the numerous clustering tools developed recently, the rapid growth of scRNA-seq volumes prompts for a more (computationally) efficient clustering method. Here, we introduce Secuer, a Scalable and Efficient speCtral clUstERing algorithm for scRNA-seq data. By employing an anchor-based bipartite graph representation algorithm, Secuer enjoys reduced runtime and memory usage over one order of magnitude for datasets with more than 1 million cells. Meanwhile, Secuer also achieves better or comparable accuracy than competing methods in small and moderate benchmark datasets. Furthermore, we showcase that Secuer can also serve as a building block for a new consensus clustering method, Secuer-consensus, which again improves the runtime and scalability of state-of-the-art consensus clustering methods while also maintaining the accuracy. Overall, Secuer is a versatile, accurate, and scalable clustering framework suitable for small to ultra-large single-cell clustering tasks. Author summary Recently, single-cell RNA sequencing (scRNA-seq) has enabled profiling of thousands to millions of cells, spurring the development of efficient clustering algorithms for large or ultra-large datasets. In this work, we developed an ultrafast clustering method, Secuer, for small to ultra-large scRNA-seq data. Using simulation and real datasets, we demonstrated that Secuer yields high accuracy, while saving runtime and memory usage by orders of magnitude, and that it can be efficiently scaled up to ultra-large datasets. Additionally, with Secuer as a subroutine, we proposed Secuer-consensus, a consensus clustering algorithm. Our results show that Secuer-consensus performs better in terms of clustering accuracy and runtime.
引用
收藏
页数:20
相关论文
共 50 条
  • [31] Scedar: A scalable Python']Python package for single-cell RNA-seq exploratory data analysis
    Zhang, Yuanchao
    Kim, Man S.
    Reichenberger, Erin R.
    Stear, Ben
    Taylor, Deanne M.
    PLOS COMPUTATIONAL BIOLOGY, 2020, 16 (04)
  • [32] SCnorm: robust normalization of single-cell RNA-seq data
    Bacher, Rhonda
    Chu, Li-Fang
    Leng, Ning
    Gasch, Audrey P.
    Thomson, James A.
    Stewart, Ron M.
    Newton, Michael
    Kendziorski, Christina
    NATURE METHODS, 2017, 14 (06) : 584 - +
  • [33] Testing for Phylogenetic Signal in Single-Cell RNA-Seq Data
    Moravec, Jiri C.
    Lanfear, Robert
    Spector, David L.
    Diermeier, Sarah D.
    Gavryushkin, Alex
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2023, 30 (04) : 518 - 537
  • [34] Enhancing Clustering of single-cell RNA-seq data by Proximity Learning on Random Projected spaces
    Vrahatis, Aristidis G.
    Dimitrakopoulos, Georgios N.
    Tasoulis, Sotiris K.
    Plagianakos, Vassilis P.
    2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2019, : 846 - 849
  • [35] ScGSLC: An unsupervised graph similarity learning framework for single-cell RNA-seq data clustering
    Li, Junyi
    Jiang, Wei
    Han, Henry
    Liu, Jing
    Liu, Bo
    Wang, Yadong
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2021, 90
  • [36] Evaluation of single-cell RNA-seq clustering algorithms on cancer tumor datasets
    Mahalanabis, Alaina
    Turinsky, Andrei L.
    Husic, Mia
    Christensen, Erik
    Luo, Ping
    Naidas, Alaine
    Brudno, Michael
    Pugh, Trevor
    Ramani, Arun K.
    Shooshtari, Parisa
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2022, 20 : 6375 - 6387
  • [37] scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data
    Alquicira-Hernandez, Jose
    Sathe, Anuja
    Ji, Hanlee P.
    Quan Nguyen
    Powell, Joseph E.
    GENOME BIOLOGY, 2019, 20 (01)
  • [38] Valid Post-clustering Differential Analysis for Single-Cell RNA-Seq
    Zhang, Jesse M.
    Kamath, Govinda M.
    Tse, David N.
    CELL SYSTEMS, 2019, 9 (04) : 383 - +
  • [39] scGNN 2.0: a graph neural network tool for imputation and clustering of single-cell RNA-Seq data
    Gu, Haocheng
    Cheng, Hao
    Ma, Anjun
    Li, Yang
    Wang, Juexin
    Xu, Dong
    Ma, Qin
    BIOINFORMATICS, 2022, 38 (23) : 5322 - 5325
  • [40] GRACE: A Graph-Based Cluster Ensemble Approach for Single-Cell RNA-Seq Data Clustering
    Guan, Jihong
    Li, Rui-Yi
    Wang, Jiasheng
    IEEE ACCESS, 2020, 8 : 166730 - 166741