Dirichlet process mixture models for single-cell RNA-seq clustering

被引:6
|
作者
Adossa, Nigatu A. [1 ,2 ]
Rytkonen, Kalle T. [1 ,2 ,3 ]
Elo, Laura L. [1 ,2 ,4 ]
机构
[1] Univ Turku, Turku Biosci Ctr, FI-20520 Turku, Finland
[2] Abo Akad Univ, FI-20520 Turku, Finland
[3] Univ Turku, Res Ctr Integrat Physiol & Pharmacol, Inst Biomed, FI-20014 Turku, Finland
[4] Univ Turku, Inst Biomed, FI-20014 Turku, Finland
来源
BIOLOGY OPEN | 2022年 / 11卷 / 04期
基金
芬兰科学院;
关键词
Clustering; Hierarchical Dirichlet process (HDP); Latent Dirichlet allocation (LDA); ScRNA-seq; VARIATIONAL INFERENCE; RECONSTRUCTION;
D O I
10.1242/bio.059001
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Clustering of cells based on gene expression is one of the major steps in single-cell RNA-sequencing (scRNA-seq) data analysis. One key challenge in cluster analysis is the unknown number of clusters and, for this issue, there is still no comprehensive solution. To enhance the process of defining meaningful cluster resolution, we compare Bayesian latent Dirichlet allocation (LDA) method to its non-parametric counterpart, hierarchical Dirichlet process (HDP) in the context of clustering scRNA-seq data. A potential main advantage of HDP is that it does not require the number of clusters as an input parameter from the user. While LDA has been used in single-cell data analysis, it has not been compared in detail with HDP. Here, we compare the cell clustering performance of LDA and HDP using four scRNA-seq datasets (immune cells, kidney, pancreas and decidua/placenta), with a specific focus on cluster numbers. Using both intrinsic (DB-index) and extrinsic (ARI) cluster quality measures, we show that the performance of LDA and HDP is dataset dependent. We describe a case where HDP produced a more appropriate clustering compared to the best performer from a series of LDA clusterings with different numbers of clusters. However, we also observed cases where the best performing LDA cluster numbers appropriately capture the main biological features while HDP tended to inflate the number of clusters. Overall, our study highlights the importance of carefully assessing the number of clusters when analyzing scRNA-seq data.
引用
收藏
页数:9
相关论文
共 50 条
  • [41] scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation
    Wang, Jing
    Xia, Junfeng
    Tan, Dayu
    Lin, Rongxin
    Su, Yansen
    Zheng, Chun-Hou
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (02)
  • [42] Accounting for cell type hierarchy in evaluating single cell RNA-seq clustering
    Wu, Zhijin
    Wu, Hao
    GENOME BIOLOGY, 2020, 21 (01)
  • [43] A Survey on Methods for Predicting Polyadenylation Sites from DNA Sequences, Bulk RNA-seq, and Single-cell RNA-seq
    Ye, Wenbin
    Lian, Qiwei
    Ye, Congting
    Wu, Xiaohui
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2023, 21 (01) : 67 - 83
  • [44] Accounting for cell type hierarchy in evaluating single cell RNA-seq clustering
    Zhijin Wu
    Hao Wu
    Genome Biology, 21
  • [45] MetaCell: analysis of single-cell RNA-seq data using K-nn graph partitions
    Baran, Yael
    Bercovich, Akhiad
    Sebe-Pedros, Arnau
    Lubling, Yaniv
    Giladi, Amir
    Chomsky, Elad
    Meir, Zohar
    Hoichman, Michael
    Lifshitz, Aviezer
    Tanay, Amos
    GENOME BIOLOGY, 2019, 20 (01)
  • [46] scBKAP: A Clustering Model for Single-Cell RNA-Seq Data Based on Bisecting K-Means
    Wang, Xiaolin
    Gao, Hongli
    Qi, Ren
    Zheng, Ruiqing
    Gao, Xin
    Yu, Bin
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (03) : 2007 - 2015
  • [47] Deep Batch Integration and Denoise of Single-Cell RNA-Seq Data
    Qin, Lu
    Zhang, Guangya
    Zhang, Shaoqiang
    Chen, Yong
    ADVANCED SCIENCE, 2024, 11 (29)
  • [48] Single-cell RNA-seq analysis of testicular somatic cell development in pigs
    Zhang, Lingkai
    Guo, Ming
    Liu, Zidong
    Liu, Ruifang
    Zheng, Yi
    Yu, Taiyong
    Lv, Yinghua
    Lu, Hongzhao
    Zeng, Wenxian
    Zhang, Tao
    Pan, Chuanying
    JOURNAL OF GENETICS AND GENOMICS, 2022, 49 (11) : 1016 - 1028
  • [49] Genotype-free demultiplexing of pooled single-cell RNA-seq
    Jun Xu
    Caitlin Falconer
    Quan Nguyen
    Joanna Crawford
    Brett D. McKinnon
    Sally Mortlock
    Anne Senabouth
    Stacey Andersen
    Han Sheng Chiu
    Longda Jiang
    Nathan J. Palpant
    Jian Yang
    Michael D. Mueller
    Alex W. Hewitt
    Alice Pébay
    Grant W. Montgomery
    Joseph E. Powell
    Lachlan J.M Coin
    Genome Biology, 20
  • [50] Genotype-free demultiplexing of pooled single-cell RNA-seq
    Xu, Jun
    Falconer, Caitlin
    Nguyen, Quan
    Crawford, Joanna
    McKinnon, Brett D.
    Mortlock, Sally
    Senabouth, Anne
    Andersen, Stacey
    Chiu, Han Sheng
    Jiang, Longda
    Palpant, Nathan J.
    Yang, Jian
    Mueller, Michael D.
    Hewitt, Alex W.
    Pebay, Alice
    Montgomery, Grant W.
    Powell, Joseph E.
    Coin, Lachlan J. M.
    GENOME BIOLOGY, 2019, 20 (01)