Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data

被引:0
作者
Prabhakaran, Sandhya [1 ]
Azizi, Elham
Carr, Ambrose
Pe'er, Dana
机构
[1] Columbia Univ, Dept Biol Sci, New York, NY 10027 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷
基金
美国国家科学基金会;
关键词
RNA-SEQ; SAMPLING METHODS; GENOME-WIDE; HETEROGENEITY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.
引用
收藏
页数:10
相关论文
共 42 条
  • [1] viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia
    Amir, El-ad David
    Davis, Kara L.
    Tadmor, Michelle D.
    Simonds, Erin F.
    Levine, Jacob H.
    Bendall, Sean C.
    Shenfeld, Daniel K.
    Krishnaswamy, Smita
    Nolan, Garry P.
    Pe'er, Dana
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (06) : 545 - +
  • [2] Anders S, 2010, DIFFERENTIAL EXPRESS, V11, pR106, DOI ettestedotorgti01186/gb-2010-11-10-r106
  • [3] MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS
    ANTONIAK, CE
    [J]. ANNALS OF STATISTICS, 1974, 2 (06) : 1152 - 1174
  • [4] Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
  • [5] Blei D. M., 2004, International Conference on Machine Learning, P12, DOI DOI 10.1145/1015330.1015439
  • [6] Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/nmeth.2645, 10.1038/NMETH.2645]
  • [7] Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells
    Buettner, Florian
    Natarajan, Kedar N.
    Casale, F. Paolo
    Proserpio, Valentina
    Scialdone, Antonio
    Theis, Fabian J.
    Teichmann, Sarah A.
    Marioni, John C.
    Stegie, Oliver
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (02) : 155 - 160
  • [8] Unraveling cell populations in tumors by single-cell mass cytometry
    Di Palma, Serena
    Bodenmiller, Bernd
    [J]. CURRENT OPINION IN BIOTECHNOLOGY, 2015, 31 : 122 - 129
  • [9] Wishart and pseudo-Wishart distributions and some applications to shape theory
    DiazGarcia, JA
    Jaimez, RG
    Mardia, KV
    [J]. JOURNAL OF MULTIVARIATE ANALYSIS, 1997, 63 (01) : 73 - 87
  • [10] Fan J, 2016, NAT METHODS, V13, P241, DOI [10.1038/NMETH.3734, 10.1038/nmeth.3734]