Single-cell RNA-seq interpretations using evolutionary multiobjective ensemble pruning

被引:28
作者
Li, Xiangtao [1 ,2 ]
Zhang, Shixiong [2 ]
Wong, Ka-Chun [2 ]
机构
[1] Northeast Normal Univ, Sch Comp Sci & Informat Technol, Changchun, Jilin, Peoples R China
[2] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
GENE-EXPRESSION; HETEROGENEITY;
D O I
10.1093/bioinformatics/bty1056
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: In recent years, single-cell RNA sequencing enables us to discover cell types or even sub-types. Its increasing availability provides opportunities to identify cell populations from single-cell RNA-seq data. Computational methods have been employed to reveal the gene expression variations among multiple cell populations. Unfortunately, the existing ones can suffer from realistic restrictions such as experimental noises, numerical instability, high dimensionality and computational scalability. Results: We propose an evolutionary multiobjective ensemble pruning algorithm (EMEP) that addresses those realistic restrictions. Our EMEP algorithm first applies the unsupervised dimensionality reduction to project data from the original high dimensions to low-dimensional subspaces; basic clustering algorithms are applied in those new subspaces to generate different clustering results to form cluster ensembles. However, most of those cluster ensembles are unnecessarily bulky with the expense of extra time costs and memory consumption. To overcome that problem, EMEP is designed to dynamically select the suitable clustering results from the ensembles. Moreover, to guide the multiobjective ensemble evolution, three cluster validity indices including the overall cluster deviation, the within-cluster compactness and the number of basic partition clusters are formulated as the objective functions to unleash its cell type discovery performance using evolutionary multiobjective optimization. We applied EMEP to 55 simulated datasets and seven real single-cell RNA-seq datasets, including six single-cell RNA-seq dataset and one large-scale dataset with 3005 cells and 4412 genes. Two case studies are also conducted to reveal mechanistic insights into the biological relevance of EMEP. We found that EMEP can achieve superior performance over the other clustering algorithms, demonstrating that EMEP can identify cell populations clearly.
引用
收藏
页码:2809 / 2817
页数:9
相关论文
共 42 条
[1]   An ensemble framework for clustering protein-protein interaction networks [J].
Asur, Sitaram ;
Ucar, Duygu ;
Parthasarathy, Srinivasan .
BIOINFORMATICS, 2007, 23 (13) :I29-I40
[2]   Fuzzy ensemble clustering based on random projections for DNA microarray data analysis [J].
Avogadri, Roberto ;
Valentini, Giorgio .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2009, 45 (2-3) :173-183
[3]   Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[4]   Link-based similarity measures for the classification of Web documents [J].
Calado, P ;
Cristo, M ;
Gonçalves, MA ;
de Moura, ES ;
Ribeiro-Neto, B ;
Ziviani, N .
JOURNAL OF THE AMERICAN SOCIETY FOR INFORMATION SCIENCE AND TECHNOLOGY, 2006, 57 (02) :208-221
[5]  
Das Gupta Mithun, 2011, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P2841, DOI 10.1109/CVPR.2011.5995492
[6]   Differential Evolution: A Survey of the State-of-the-Art [J].
Das, Swagatam ;
Suganthan, Ponnuthurai Nagaratnam .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2011, 15 (01) :4-31
[7]   An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints [J].
Deb, Kalyanmoy ;
Jain, Himanshu .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2014, 18 (04) :577-601
[8]   Single-Cell RNA-Seq Reveals Dynamic, Random Monoallelic Gene Expression in Mammalian Cells [J].
Deng, Qiaolin ;
Ramskold, Daniel ;
Reinius, Bjorn ;
Sandberg, Rickard .
SCIENCE, 2014, 343 (6167) :193-196
[9]   Ensemble clustering in medical diagnostics [J].
Greene, D ;
Tsymbal, A ;
Bolshakova, N ;
Cunningham, P .
17TH IEEE SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, PROCEEDINGS, 2004, :576-581
[10]   Ensemble non-negative matrix factorization methods for clustering proteinprotein interactions [J].
Greene, Derek ;
Cagney, Gerard ;
Krogan, Nevan ;
Cunningham, Padraig .
BIOINFORMATICS, 2008, 24 (15) :1722-1728