Dimension Reduction and Clustering Models for Single-Cell RNA Sequencing Data: A Comparative Study

被引:35
作者
Feng, Chao [1 ]
Liu, Shufen [1 ]
Zhang, Hao [1 ]
Guan, Renchu [1 ,2 ]
Li, Dan [3 ,4 ]
Zhou, Fengfeng [1 ]
Liang, Yanchun [1 ,2 ]
Feng, Xiaoyue [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, Minist Educ, Key Lab Symbol Computat & Knowledge Engn, Changchun 130012, Peoples R China
[2] Jilin Univ, Zhuhai Coll, Zhuhai Sub Lab, Key Lab Symbol Computat & Knowledge Engn,Minist E, Zhuhai 519041, Peoples R China
[3] Univ Arkansas, Little Rock George Washington Donaghey Coll Engn, Joint Bioinformat Program, Little Rock, AR 72204 USA
[4] Univ Arkansas Med Sci, Little Rock, AR 72204 USA
基金
中国国家自然科学基金;
关键词
single-cell RNA sequencing; dimensionality reduction; clustering algorithm; NONNEGATIVE MATRIX FACTORIZATION; ALGORITHMS; SEQ; IDENTIFICATION; EMBRYOS;
D O I
10.3390/ijms21062181
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
With recent advances in single-cell RNA sequencing, enormous transcriptome datasets have been generated. These datasets have furthered our understanding of cellular heterogeneity and its underlying mechanisms in homogeneous populations. Single-cell RNA sequencing (scRNA-seq) data clustering can group cells belonging to the same cell type based on patterns embedded in gene expression. However, scRNA-seq data are high-dimensional, noisy, and sparse, owing to the limitation of existing scRNA-seq technologies. Traditional clustering methods are not effective and efficient for high-dimensional and sparse matrix computations. Therefore, several dimension reduction methods have been introduced. To validate a reliable and standard research routine, we conducted a comprehensive review and evaluation of four classical dimension reduction methods and five clustering models. Four experiments were progressively performed on two large scRNA-seq datasets using 20 models. Results showed that the feature selection method contributed positively to high-dimensional and sparse scRNA-seq data. Moreover, feature-extraction methods were able to promote clustering performance, although this was not eternally immutable. Independent component analysis (ICA) performed well in those small compressed feature spaces, whereas principal component analysis was steadier than all the other feature-extraction methods. In addition, ICA was not ideal for fuzzy C-means clustering in scRNA-seq data analysis. K-means clustering was combined with feature-extraction methods to achieve good results.
引用
收藏
页数:21
相关论文
共 69 条
[1]   Data-driven identification of prognostic tumor subpopulations using spatially mapped t-SNE of mass spectrometry imaging data [J].
Abdelmoula, Walid M. ;
Balluff, Benjamin ;
Englert, Sonja ;
Dijkstra, Jouke ;
Reinders, Marcel J. T. ;
Walch, Axel ;
McDonnell, Liam A. ;
Lelieveldt, Boudewijn P. F. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (43) :12244-12249
[2]   An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics [J].
Angelidis, Ilias ;
Simon, Lukas M. ;
Fernandez, Isis E. ;
Strunz, Maximilian ;
Mayr, Christoph H. ;
Greiffo, Flavia R. ;
Tsitsiridis, George ;
Ansari, Meshal ;
Graf, Elisabeth ;
Strom, Tim-Matthias ;
Nagendran, Monica ;
Desai, Tushar ;
Eickelberg, Oliver ;
Mann, Matthias ;
Theis, Fabian J. ;
Schiller, Herbert B. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[3]  
[Anonymous], 2001, P 18 INT C MACHINE L
[4]  
[Anonymous], 2011, Principal component analysis International Encyclopedia of Statistical Science, DOI DOI 10.1007/978-3-642-04898-2_455
[5]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[6]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[7]   Probabilistic PCA of censored data: accounting for uncertainties in the visualization of high-throughput single-cell qPCR data [J].
Buettner, Florian ;
Moignard, Victoria ;
Goettgens, Berthold ;
Theis, Fabian J. .
BIOINFORMATICS, 2014, 30 (13) :1867-1875
[8]   Rare cell isolation and analysis in microfluidics [J].
Chen, Yuchao ;
Li, Peng ;
Huang, Po-Hsun ;
Xie, Yuliang ;
Mai, John D. ;
Wang, Lin ;
Nam-Trung Nguyen ;
Huang, Tony Jun .
LAB ON A CHIP, 2014, 14 (04) :626-645
[9]   A survey of human brain transcriptome diversity at the single cell level [J].
Darmanis, Spyros ;
Sloan, Steven A. ;
Zhang, Ye ;
Enge, Martin ;
Caneda, Christine ;
Shuer, Lawrence M. ;
Gephart, Melanie G. Hayden ;
Barres, Ben A. ;
Quake, Stephen R. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (23) :7285-7290
[10]   Comprehensive review of the identification of essential genes using computational methods: focusing on feature implementation and assessment [J].
Dong, Chuan ;
Jin, Yan-Ting ;
Hua, Hong-Li ;
Wen, Qing-Feng ;
Luo, Sen ;
Zheng, Wen-Xin ;
Guo, Feng-Biao .
BRIEFINGS IN BIOINFORMATICS, 2020, 21 (01) :171-181