Dirichlet process mixture models for single-cell RNA-seq clustering

被引：6

作者：

Adossa, Nigatu A. ^{[1
,2
]}

Rytkonen, Kalle T. ^{[1
,2
,3
]}

Elo, Laura L. ^{[1
,2
,4
]}

机构：

[1] Univ Turku, Turku Biosci Ctr, FI-20520 Turku, Finland

[2] Abo Akad Univ, FI-20520 Turku, Finland

[3] Univ Turku, Res Ctr Integrat Physiol & Pharmacol, Inst Biomed, FI-20014 Turku, Finland

[4] Univ Turku, Inst Biomed, FI-20014 Turku, Finland

来源：

BIOLOGY OPEN | 2022年 / 11卷 / 04期

基金：

芬兰科学院;

关键词：

Clustering; Hierarchical Dirichlet process (HDP); Latent Dirichlet allocation (LDA); ScRNA-seq; VARIATIONAL INFERENCE; RECONSTRUCTION;

D O I：

10.1242/bio.059001

中图分类号：

Q [生物科学];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.

引用

页数：9

共 50 条

[21] scASGC: An adaptive simplified graph convolution model for clustering single-cell RNA-seq data
Wang, Shudong
Zhang, Yu
Zhang, Yulin
Wu, Wenhao
Ye, Lan
Li, Yunyin
Su, Jionglong
Pang, Shanchen
COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 163
[22] Comparison of Gene Selection Methods for Clustering Single-cell RNA-seq Data
Zhu, Xiaoshu
Wang, Jianxin
Li, Rongruan
Peng, Xiaoqing
CURRENT BIOINFORMATICS, 2023, 18 (01) : 1 - 11
[23] Evaluating imputation methods for single-cell RNA-seq data
Yi Cheng
Xiuli Ma
Lang Yuan
Zhaoguo Sun
Pingzhang Wang
BMC Bioinformatics, 24
[24] Evaluating imputation methods for single-cell RNA-seq data
Cheng, Yi
Ma, Xiuli
Yuan, Lang
Sun, Zhaoguo
Wang, Pingzhang
BMC BIOINFORMATICS, 2023, 24 (01)
[25] Decontamination of ambient RNA in single-cell RNA-seq with DecontX
Yang, Shiyi
Corbett, Sean E.
Koga, Yusuke
Wang, Zhe
Johnson, W. Evan
Yajima, Masanao
Campbell, Joshua D.
GENOME BIOLOGY, 2020, 21 (01)
[26] Decontamination of ambient RNA in single-cell RNA-seq with DecontX
Shiyi Yang
Sean E. Corbett
Yusuke Koga
Zhe Wang
W Evan Johnson
Masanao Yajima
Joshua D. Campbell
Genome Biology, 21
[27] A hybrid deep clustering approach for robust cell type profiling using single-cell RNA-seq data
Srinivasan, Suhas
Leshchyk, Anastasia
Johnson, Nathan T.
Korkin, Dmitry
RNA, 2020, 26 (10) : 1303 - 1319
[28] Review of single-cell RNA-seq data clustering for cell-type identification and characterization
Zhang, Shixiong
Li, Xiangtao
Lin, Jiecong
Lin, Qiuzhen
Wong, Ka-Chun
RNA, 2023, 29 (05) : 517 - 530
[29] A Hybrid Clustering Algorithm for Identifying Cell Types from Single-Cell RNA-Seq Data
Zhu, Xiaoshu
Li, Hong-Dong
Xu, Yunpei
Guo, Lilu
Wu, Fang-Xiang
Duan, Guihua
Wang, Jianxin
GENES, 2019, 10 (02)
[30] A deep matrix factorization based approach for single-cell RNA-seq data clustering
Liang, Zhenlan
Zheng, Ruiqing
Chen, Siqi
Yan, Xuhua
Li, Min
METHODS, 2022, 205 : 114 - 122

← 1 2 3 4 5 →