Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation

被引:36
作者
Chen, Liang [1 ]
Zhai, Yuyao [2 ]
He, Qiuyan [1 ]
Wang, Weinan [1 ]
Deng, Minghua [1 ,3 ,4 ]
机构
[1] Peking Univ, Sch Math Sci, Beijing 100871, Peoples R China
[2] Northeast Normal Univ, Math & Stat Inst, Changchun 130024, Peoples R China
[3] Peking Univ, Ctr Quantitat Biol, Beijing 100871, Peoples R China
[4] Peking Univ, Ctr Stat Sci, Beijing 100871, Peoples R China
基金
中国国家自然科学基金;
关键词
single-cell RNA sequencing; clustering and annotation; supervised learning; self-supervised learning; unsupervised learning; ATLAS;
D O I
10.3390/genes11070792
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
As single-cell RNA sequencing technologies mature, massive gene expression profiles can be obtained. Consequently, cell clustering and annotation become two crucial and fundamental procedures affecting other specific downstream analyses. Most existing single-cell RNA-seq (scRNA-seq) data clustering algorithms do not take into account the available cell annotation results on the same tissues or organisms from other laboratories. Nonetheless, such data could assist and guide the clustering process on the target dataset. Identifying marker genes through differential expression analysis to manually annotate large amounts of cells also costs labor and resources. Therefore, in this paper, we propose a novel end-to-end cell supervised clustering and annotation framework called scAnCluster, which fully utilizes the cell type labels available from reference data to facilitate the cell clustering and annotation on the unlabeled target data. Our algorithm integrates deep supervised learning, self-supervised learning and unsupervised learning techniques together, and it outperforms other customized scRNA-seq supervised clustering methods in both simulation and real data. It is particularly worth noting that our method performs well on the challenging task of discovering novel cell types that are absent in the reference data.
引用
收藏
页码:1 / 20
页数:20
相关论文
共 45 条
[1]   A comparison of automatic cell identification methods for single-cell RNA sequencing data [J].
Abdelaal, Tamim ;
Michielsen, Lieke ;
Cats, Davy ;
Hoogduin, Dylan ;
Mei, Hailiang ;
Reinders, Marcel J. T. ;
Mahfouz, Ahmed .
GENOME BIOLOGY, 2019, 20 (01)
[2]   Reference-based analysis of lung single-cell sequencing reveals a transitional profibrotic macrophage [J].
Aran, Dvir ;
Looney, Agnieszka P. ;
Liu, Leqian ;
Wu, Esther ;
Fong, Valerie ;
Hsu, Austin ;
Chak, Suzanna ;
Naikawadi, Ram P. ;
Wolters, Paul J. ;
Abate, Adam R. ;
Butte, Atul J. ;
Bhattacharya, Mallar .
NATURE IMMUNOLOGY, 2019, 20 (02) :163-+
[3]   A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure [J].
Baron, Maayan ;
Veres, Adrian ;
Wolock, Samuel L. ;
Faust, Aubrey L. ;
Gaujoux, Renaud ;
Vetere, Amedeo ;
Ryu, Jennifer Hyoje ;
Wagner, Bridget K. ;
Shen-Orr, Shai S. ;
Klein, Allon M. ;
Melton, Douglas A. ;
Yanai, Itai .
CELL SYSTEMS, 2016, 3 (04) :346-+
[4]   A molecular census of arcuate hypothalamus and median eminence cell types [J].
Campbell, John N. ;
Macosko, Evan Z. ;
Fenselau, Henning ;
Pers, Tune H. ;
Lyubetskaya, Anna ;
Tenen, Danielle ;
Goldman, Melissa ;
Verstegen, Anne M. J. ;
Resch, Jon M. ;
McCarroll, Steven A. ;
Rosen, Evan D. ;
Lowell, Bradford B. ;
Tsai, Linus T. .
NATURE NEUROSCIENCE, 2017, 20 (03) :484-496
[5]   The single-cell transcriptional landscape of mammalian organogenesis [J].
Cao, Junyue ;
Spielmann, Malte ;
Qiu, Xiaojie ;
Huang, Xingfan ;
Ibrahim, Daniel M. ;
Hill, Andrew J. ;
Zhang, Fan ;
Mundlos, Stefan ;
Christiansen, Lena ;
Steemers, Frank J. ;
Trapnell, Cole ;
Shendure, Jay .
NATURE, 2019, 566 (7745) :496-+
[6]   Searching large-scale scRNA-seq databases via unbiased cell embedding with Cell BLAST [J].
Cao, Zhi-Jie ;
Wei, Lin ;
Lu, Shen ;
Yang, De-Chang ;
Gao, Ge .
NATURE COMMUNICATIONS, 2020, 11 (01)
[7]  
Chen L., 2020, NAR GENOM BIOINFORM, V2, DOI [10.1093/nargab/lqaa039, DOI 10.1093/nargab/lqaa039]
[8]   Single-Cell Transcriptome Data Clustering via Multinomial Modeling and Adaptive Fuzzy K-Means Algorithm [J].
Chen, Liang ;
Wang, Weinan ;
Zhai, Yuyao ;
Deng, Minghua .
FRONTIERS IN GENETICS, 2020, 11
[9]   The Cell Ontology 2016: enhanced content, modularization, and ontology interoperability [J].
Diehl, Alexander D. ;
Meehan, Terrence F. ;
Bradford, Yvonne M. ;
Brush, Matthew H. ;
Dahdul, Wasila M. ;
Dougall, David S. ;
He, Yongqun ;
Osumi-Sutherland, David ;
Ruttenberg, Alan ;
Sarntivijai, Sirarat ;
Van Slyke, Ceri E. ;
Vasilevsky, Nicole A. ;
Haendel, Melissa A. ;
Blake, Judith A. ;
Mungall, Christopher J. .
JOURNAL OF BIOMEDICAL SEMANTICS, 2016, 7
[10]   Systematic comparison of single-cell and single-nucleus RNA-sequencing methods [J].
Ding, Jiarui ;
Adiconis, Xian ;
Simmons, Sean K. ;
Kowalczyk, Monika S. ;
Hession, Cynthia C. ;
Marjanovic, Nemanja D. ;
Hughes, Travis K. ;
Wadsworth, Marc H. ;
Burks, Tyler ;
Nguyen, Lan T. ;
Kwon, John Y. H. ;
Baraks, Boaz ;
Ge, William ;
Kedaigle, Amanda J. ;
Carroll, Shaina ;
Li, Shuqiang ;
Hacohen, Nir ;
Rozenblatt-Rosen, Orit ;
Shalek, Alex K. ;
Villani, Alexandra-Chloe ;
Regev, Aviv ;
Levin, Joshua Z. .
NATURE BIOTECHNOLOGY, 2020, 38 (06) :737-+