scGMAI: a Gaussian mixture model for clustering single-cell RNA-Seq data based on deep autoencoder

被引:45
作者
Yu, Bin [1 ]
Chen, Chen [1 ]
Qi, Ren [2 ]
Zheng, Ruiqing [3 ]
Skillman-Lawrence, Patrick J. [4 ]
Wang, Xiaolin [1 ]
Ma, Anjun [5 ]
Gu, Haiming [1 ]
机构
[1] Qingdao Univ Sci & Technol, Coll Math & Phys, 99 Songling Rd, Qingdao 266061, Peoples R China
[2] Tianjin Univ, Coll Intelligence & Comp, Tianjin, Peoples R China
[3] Cent South Univ, Sch Comp Sci & Engn, Changsha, Peoples R China
[4] Ohio State Univ, Coll Med, Columbus, OH 43210 USA
[5] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
关键词
scRNA-Seq; autoencoder networks; fast independent component analysis; Gaussian mixture model; cell clustering; EXPRESSION;
D O I
10.1093/bib/bbaa316
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The rapid development of single-cell RNA sequencing (scRNA-Seq) technology provides strong technical support for accurate and efficient analyzing single-cell gene expression data. However, the analysis of scRNA-Seq is accompanied by many obstacles, including dropout events and the curse of dimensionality. Here, we propose the scGMAI, which is a new single-cell Gaussian mixture clustering method based on autoencoder networks and the fast independent component analysis (FastICA). Specifically, scGMAI utilizes autoencoder networks to reconstruct gene expression values from scRNA-Seq data and FastICA is used to reduce the dimensions of reconstructed data. The integration of these computational techniques in scGMAI leads to outperforming results compared to existing tools, including Seurat, in clustering cells from 17 public scRNA-Seq datasets. In summary, scGMAI is an effective tool for accurately clustering and identifying cell types from scRNA-Seq data and shows the great potential of its applicative power in scRNA-Seq data analysis.
引用
收藏
页数:10
相关论文
共 48 条
[1]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[2]   Design and computational analysis of single-cell RNA-sequencing experiments [J].
Bacher, Rhonda ;
Kendziorski, Christina .
GENOME BIOLOGY, 2016, 17
[3]   Dimensionality reduction for visualizing single-cell data using UMAP [J].
Becht, Etienne ;
McInnes, Leland ;
Healy, John ;
Dutertre, Charles-Antoine ;
Kwok, Immanuel W. H. ;
Ng, Lai Guan ;
Ginhoux, Florent ;
Newell, Evan W. .
NATURE BIOTECHNOLOGY, 2019, 37 (01) :38-+
[4]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[5]   Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[6]   A method for making group inferences from functional MRI data using independent component analysis [J].
Calhoun, VD ;
Adali, T ;
Pearlson, GD ;
Pekar, JJ .
HUMAN BRAIN MAPPING, 2001, 14 (03) :140-151
[7]   scRMD: imputation for single cell RNA-seq data via robust matrix decomposition [J].
Chen, Chong ;
Wu, Changjing ;
Wu, Linjie ;
Wang, Xiaochen ;
Deng, Minghua ;
Xi, Ruibin .
BIOINFORMATICS, 2020, 36 (10) :3156-3161
[8]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[9]   Scalable analysis of cell-type composition from single-cell transcriptomics using deep recurrent learning [J].
Deng, Yue ;
Bao, Feng ;
Dai, Qionghai ;
Wu, Lani F. ;
Altschuler, Steven J. .
NATURE METHODS, 2019, 16 (04) :311-+
[10]   What is the expectation maximization algorithm? [J].
Do, Chuong B. ;
Batzoglou, Serafim .
NATURE BIOTECHNOLOGY, 2008, 26 (08) :897-899