glmGamPoi: fitting Gamma-Poisson generalized linear models on single cell count data

被引:105
作者
Ahlmann-Eltze, Constantin [1 ]
Huber, Wolfgang [1 ]
机构
[1] EMBL, Genome Biol Unit, D-69117 Heidelberg, Germany
基金
欧洲研究理事会;
关键词
DIFFERENTIAL EXPRESSION ANALYSIS;
D O I
10.1093/bioinformatics/btaa1009
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The Gamma-Poisson distribution is a theoretically and empirically motivated model for the sampling variability of single cell RNA-sequencing counts and an essential building block for analysis approaches including differential expression analysis, principal component analysis and factor analysis. Existing implementations for inferring its parameters from data often struggle with the size of single cell datasets, which can comprise millions of cells; at the same time, they do not take full advantage of the fact that zero and other small numbers are frequent in the data. These limitations have hampered uptake of the model, leaving room for statistically inferior approaches such as logarithm(-like) transformation. Results: We present a new R package for fitting the Gamma-Poisson distribution to data with the characteristics of modern single cell datasets more quickly and more accurately than existing methods. The software can work with data on disk without having to load them into RAM simultaneously. Availabilityand implementation: The package glmGamPoi is available from Bioconductor for Windows, macOS and Linux, and source code is available on github.com/const-ae/glmGamPoi under a GPL-3 license. The scripts to reproduce the results of this paper are available on github.com/const-ae/glmGamPoi-Paper. Contact: constantin.ahlmann@embl.de Supplementary information: Supplementary data are available at Bioinformatics online.
引用
收藏
页码:5701 / 5702
页数:2
相关论文
共 16 条
  • [1] Differential expression analysis for sequence count data
    Anders, Simon
    Huber, Wolfgang
    [J]. GENOME BIOLOGY, 2010, 11 (10):
  • [2] Crowell H.L., 2019, BIORXIV, P1, DOI [10.1101/713412v3, DOI 10.1101/713412V3]
  • [3] Grün D, 2014, NAT METHODS, V11, P637, DOI [10.1038/NMETH.2930, 10.1038/nmeth.2930]
  • [4] Normalization and variance stabilization of single-cell RNA-seq data using regularized negative binomial regression
    Hafemeister, Christoph
    Satija, Rahul
    [J]. GENOME BIOLOGY, 2019, 20 (01)
  • [5] LOVE MI, 2014, GENOME BIOL, V15, DOI [DOI 10.1186/S13059-014-0550-8, 10.1186/s13059-014-0550-8]
  • [6] beachmat: A Bioconductor C plus plus API for accessing high-throughput biological data from a variety of R matrix types
    Lun, Aaron T. L.
    Pages, Herve A.
    Smith, Mike L.
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2018, 14 (05)
  • [7] Detecting Differential Expression in RNA-sequence Data Using Quasi-likelihood with Shrunken Dispersion Estimates
    Lund, Steven P.
    Nettleton, Dan
    McCarthy, Davis J.
    Smyth, Gordon K.
    [J]. STATISTICAL APPLICATIONS IN GENETICS AND MOLECULAR BIOLOGY, 2012, 11 (05)
  • [8] Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation
    McCarthy, Davis J.
    Chen, Yunshun
    Smyth, Gordon K.
    [J]. NUCLEIC ACIDS RESEARCH, 2012, 40 (10) : 4288 - 4297
  • [9] Pages H., 2020, HDF5 ARRAY HDF5 BACK
  • [10] A general and flexible method for signal extraction from single-cell RNA-seq data
    Risso, Davide
    Perraudeau, Fanny
    Gribkova, Svetlana
    Dudoit, Sandrine
    Vert, Jean-Philippe
    [J]. NATURE COMMUNICATIONS, 2018, 9