The Poisson distribution model fits UMI-based single-cell RNA-sequencing data

被引:3
作者
Pan, Yue [1 ,2 ]
Landis, Justin T. [2 ,3 ]
Moorad, Razia [2 ,3 ]
Wu, Di [1 ,4 ]
Marron, J. S. [1 ,5 ]
Dittmer, Dirk P. [2 ,3 ]
机构
[1] Univ N Carolina, Dept Biostat, Chapel Hill, NC USA
[2] Univ N Carolina, Lineberger Comprehens Canc Ctr, Chapel Hill, NC 27599 USA
[3] Univ N Carolina, Dept Microbiol & Immunol, Chapel Hill, NC 27599 USA
[4] Univ N Carolina, Adam Sch Dent, Chapel Hill, NC USA
[5] Univ N Carolina, Dept Stat & Operat Res, Chapel Hill, NC USA
关键词
Single cell; RNA-seq; Poisson distribution; Data representation; GENE-EXPRESSION;
D O I
10.1186/s12859-023-05349-2
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundModeling of single cell RNA-sequencing (scRNA-seq) data remains challenging due to a high percentage of zeros and data heterogeneity, so improved modeling has strong potential to benefit many downstream data analyses. The existing zero-inflated or over-dispersed models are based on aggregations at either the gene or the cell level. However, they typically lose accuracy due to a too crude aggregation at those two levels.ResultsWe avoid the crude approximations entailed by such aggregation through proposing an independent Poisson distribution (IPD) particularly at each individual entry in the scRNA-seq data matrix. This approach naturally and intuitively models the large number of zeros as matrix entries with a very small Poisson parameter. The critical challenge of cell clustering is approached via a novel data representation as Departures from a simple homogeneous IPD (DIPD) to capture the per-gene-per-cell intrinsic heterogeneity generated by cell clusters. Our experiments using real data and crafted experiments show that using DIPD as a data representation for scRNA-seq data can uncover novel cell subtypes that are missed or can only be found by careful parameter tuning using conventional methods.ConclusionsThis new method has multiple advantages, including (1) no need for prior feature selection or manual optimization of hyperparameters; (2) flexibility to combine with and improve upon other methods, such as Seurat. Another novel contribution is the use of crafted experiments as part of the validation of our newly developed DIPD-based clustering pipeline. This new clustering pipeline is implemented in the R (CRAN) package scpoisson.
引用
收藏
页数:27
相关论文
共 48 条
  • [1] SCnorm: robust normalization of single-cell RNA-seq data
    Bacher, Rhonda
    Chu, Li-Fang
    Leng, Ning
    Gasch, Audrey P.
    Thomson, James A.
    Stewart, Ron M.
    Newton, Michael
    Kendziorski, Christina
    [J]. NATURE METHODS, 2017, 14 (06) : 584 - +
  • [2] Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells
    Buettner, Florian
    Natarajan, Kedar N.
    Casale, F. Paolo
    Proserpio, Valentina
    Scialdone, Antonio
    Theis, Fabian J.
    Teichmann, Sarah A.
    Marioni, John C.
    Stegie, Oliver
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (02) : 155 - 160
  • [3] Integrating single-cell transcriptomic data across different conditions, technologies, and species
    Butler, Andrew
    Hoffman, Paul
    Smibert, Peter
    Papalexi, Efthymia
    Satija, Rahul
    [J]. NATURE BIOTECHNOLOGY, 2018, 36 (05) : 411 - +
  • [4] Cameron A.C., 2005, MICROECONOMETRICS ME, DOI [10.1017/CBO9780511811241, DOI 10.1017/CBO9780511811241]
  • [5] The single-cell transcriptional landscape of mammalian organogenesis
    Cao, Junyue
    Spielmann, Malte
    Qiu, Xiaojie
    Huang, Xingfan
    Ibrahim, Daniel M.
    Hill, Andrew J.
    Zhang, Fan
    Mundlos, Stefan
    Christiansen, Lena
    Steemers, Frank J.
    Trapnell, Cole
    Shendure, Jay
    [J]. NATURE, 2019, 566 (7745) : 496 - +
  • [6] Gene Regulatory Network Inference from Single-Cell Data Using Multivariate Information Measures
    Chan, Thalia E.
    Stumpf, Michael P. H.
    Babtie, Ann C.
    [J]. CELL SYSTEMS, 2017, 5 (03) : 251 - +
  • [7] Identification of pathogenic TRAIL-expressing innate immune cells during HIV-1 infection in humanized mice by scRNA-Seq
    Cheng, Liang
    Yu, Haisheng
    Wrobel, John A.
    Li, Guangming
    Liu, Peng
    Hu, Zhiyuan
    Xu, Xiao-Ning
    Su, Lishan
    [J]. JCI INSIGHT, 2020, 5 (11)
  • [8] Duo Angelo, 2018, F1000Res, V7, P1141, DOI 10.12688/f1000research.15666.3
  • [9] Single-cell RNA-seq denoising using a deep count autoencoder
    Eraslan, Goekcen
    Simon, Lukas M.
    Mircea, Maria
    Mueller, Nikola S.
    Theis, Fabian J.
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [10] Mapping gene regulatory networks from single-cell omics data
    Fiers, Mark W. E. J.
    Minnoye, Liesbeth
    Aibar, Sara
    Gonzalez-Blas, Carmen Bravo
    Atak, Zeynep Kalender
    Aerts, Stein
    [J]. BRIEFINGS IN FUNCTIONAL GENOMICS, 2018, 17 (04) : 246 - 254