Single-cell RNA-seq data clustering: A survey with performance comparison study

被引:13
作者
Li, Ruiyi [1 ]
Guan, Jihong [1 ]
Zhou, Shuigeng [2 ,3 ]
机构
[1] Tongji Univ, Dept Comp Sci & Technol, 4800 Caoan Rd, Shanghai, Peoples R China
[2] Fudan Univ, Shanghai Key Lab Intelligent Informat Proc, 220 Handan Rd, Shanghai, Peoples R China
[3] Fudan Univ, Sch Comp Sci, 220 Handan Rd, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Single-cell RNA-seq; clustering; performance comparison; data preprocessing; GENE-EXPRESSION; SEQUENCING DATA; HETEROGENEITY; VALIDATION; CHALLENGES; EMBRYOS; FATE;
D O I
10.1142/S0219720020400053
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Clustering analysis has been widely applied to single-cell RNA-sequencing (scRNA-seq) data to discover cell types and cell states. Algorithms developed in recent years have greatly helped the understanding of cellular heterogeneity and the underlying mechanisms of biological processes. However, these algorithms often use different techniques, were evaluated on different datasets and compared with some of their counterparts usually using different performance metrics. Consequently, there lacks an accurate and complete picture of their merits and demerits, which makes it difficult for users to select proper algorithms for analyzing their data. To fill this gap, we first do a review on the major existing scRNA-seq data clustering methods, and then conduct a comprehensive performance comparison among them from multiple perspectives. We consider 13 state of the art scRNA-seq data clustering algorithms, and collect 12 publicly available real scRNA-seq datasets from the existing works to evaluate and compare these algorithms. Our comparative study shows that the existing methods are very diverse in performance. Even the top-performance algorithms do not perform well on all datasets, especially those with complex structures. This suggests that further research is required to explore more stable, accurate, and efficient clustering algorithms for scRNA-seq data.
引用
收藏
页数:26
相关论文
共 78 条
[1]   Identifying cell populations with scRNASeq [J].
Andrews, Tallulah S. ;
Hemberg, Martin .
MOLECULAR ASPECTS OF MEDICINE, 2018, 59 :114-122
[2]   Petilla terminology:: nomenclature of features of GABAergic interneurons of the cerebral cortex [J].
Ascoli, Giorgio A. ;
Alonso-Nanclares, Lidia ;
Anderson, Stewart A. ;
Barrionuevo, German ;
Benavides-Piccione, Ruth ;
Burkhalter, Andreas ;
Buzsaki, Gyoergy ;
Cauli, Bruno ;
DeFelipe, Javier ;
Fairen, Alfonso ;
Feldmeyer, Dirk ;
Fishell, Gord ;
Fregnac, Yves ;
Freund, Tamas F. ;
Gardner, Daniel ;
Gardner, Esther P. ;
Goldberg, Jesse H. ;
Helmstaedter, Moritz ;
Hestrin, Shaul ;
Karube, Fuyuki ;
Kisvarday, Zoltan F. ;
Lambolez, Bertrand ;
Lewis, David A. ;
Marin, Oscar ;
Markram, Henry ;
Munoz, Alberto ;
Packer, Adam ;
Petersen, Carl C. H. ;
Rockland, Kathleen S. ;
Rossier, Jean ;
Rudy, Bernardo ;
Somogyi, Peter ;
Staiger, Jochen F. ;
Tamas, Gabor ;
Thomson, Alex M. ;
Toledo-Rodriguez, Maria ;
Wang, Yun ;
West, David C. ;
Yuste, Rafael .
NATURE REVIEWS NEUROSCIENCE, 2008, 9 (07) :557-568
[3]   Design and computational analysis of single-cell RNA-sequencing experiments [J].
Bacher, Rhonda ;
Kendziorski, Christina .
GENOME BIOLOGY, 2016, 17
[4]  
Bastian M., 2009, Proceedings of the International AAAI Conference on Web and Social Media, V3, P361, DOI DOI 10.1609/ICWSM.V3I1.13937
[5]   Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing [J].
Blase, Fernando H. ;
Cao, Xiaoyi ;
Zhong, Sheng .
GENOME RESEARCH, 2014, 24 (11) :1787-1796
[6]   SVD based initialization: A head start for nonnegative matrix factorization [J].
Boutsidis, C. ;
Gallopoulos, E. .
PATTERN RECOGNITION, 2008, 41 (04) :1350-1362
[7]   Integrating single-cell transcriptomic data across different conditions, technologies, and species [J].
Butler, Andrew ;
Hoffman, Paul ;
Smibert, Peter ;
Papalexi, Efthymia ;
Satija, Rahul .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :411-+
[8]  
Calinski R, 1974, COMMUN STAT, V3, P1, DOI [DOI 10.1080/03610927408827101, 10.1080/03610927408827101]
[9]   CLUSTER SEPARATION MEASURE [J].
DAVIES, DL ;
BOULDIN, DW .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) :224-227
[10]   Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells [J].
Deng, Qiaolin ;
Ramskold, Daniel ;
Reinius, Bjorn ;
Sandberg, Rickard .
SCIENCE, 2014, 343 (6167) :193-196