Dirichlet Process Mixture Model for Correcting Technical Variation in Single-Cell Gene Expression Data

被引：0

作者：

Prabhakaran, Sandhya ^{[1
]}

Azizi, Elham

Carr, Ambrose

Pe'er, Dana

机构：

[1] Columbia Univ, Dept Biol Sci, New York, NY 10027 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48 | 2016年 / 48卷

基金：

美国国家科学基金会;

关键词：

RNA-SEQ; SAMPLING METHODS; GENOME-WIDE; HETEROGENEITY;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce an iterative normalization and clustering method for single-cell gene expression data. The emerging technology of single-cell RNA-seq gives access to gene expression measurements for thousands of cells, allowing discovery and characterization of cell types. However, the data is confounded by technical variation emanating from experimental errors and cell type-specific biases. Current approaches perform a global normalization prior to analyzing biological signals, which does not resolve missing data or variation dependent on latent cell types. Our model is formulated as a hierarchical Bayesian mixture model with cell-specific scalings that aid the iterative normalization and clustering of cells, teasing apart technical variation from biological signals. We demonstrate that this approach is superior to global normalization followed by clustering. We show identifiability and weak convergence guarantees of our method and present a scalable Gibbs inference algorithm. This method improves cluster inference in both synthetic and real single-cell data compared with previous methods, and allows easy interpretation and recovery of the underlying structure and cell types.

引用

页数：10

共 42 条

[1] viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia
Amir, El-ad David
Davis, Kara L.
Tadmor, Michelle D.
Simonds, Erin F.
Levine, Jacob H.
Bendall, Sean C.
Shenfeld, Daniel K.
Krishnaswamy, Smita
Nolan, Garry P.
Pe'er, Dana
[J]. NATURE BIOTECHNOLOGY, 2013, 31 (06) : 545 - +
[2] Anders S, 2010, DIFFERENTIAL EXPRESS, V11, pR106, DOI ettestedotorgti01186/gb-2010-11-10-r106
[3] MIXTURES OF DIRICHLET PROCESSES WITH APPLICATIONS TO BAYESIAN NONPARAMETRIC PROBLEMS
ANTONIAK, CE
[J]. ANNALS OF STATISTICS, 1974, 2 (06) : 1152 - 1174
[4] Bengio Yoshua, 2013, Statistical Language and Speech Processing. First International Conference, SLSP 2013. Proceedings: LNCS 7978, P1, DOI 10.1007/978-3-642-39593-2_1
[5] Blei D. M., 2004, International Conference on Machine Learning, P12, DOI DOI 10.1145/1015330.1015439
[6] Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/nmeth.2645, 10.1038/NMETH.2645]
[7] Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells
Buettner, Florian
Natarajan, Kedar N.
Casale, F. Paolo
Proserpio, Valentina
Scialdone, Antonio
Theis, Fabian J.
Teichmann, Sarah A.
Marioni, John C.
Stegie, Oliver
[J]. NATURE BIOTECHNOLOGY, 2015, 33 (02) : 155 - 160
[8] Unraveling cell populations in tumors by single-cell mass cytometry
Di Palma, Serena
Bodenmiller, Bernd
[J]. CURRENT OPINION IN BIOTECHNOLOGY, 2015, 31 : 122 - 129
[9] Wishart and pseudo-Wishart distributions and some applications to shape theory
DiazGarcia, JA
Jaimez, RG
Mardia, KV
[J]. JOURNAL OF MULTIVARIATE ANALYSIS, 1997, 63 (01) : 73 - 87
[10] Fan J, 2016, NAT METHODS, V13, P241, DOI [10.1038/NMETH.3734, 10.1038/nmeth.3734]

← 1 2 3 4 5 →