scMAE: a masked autoencoder for single-cell RNA-seq clustering

被引:14
作者
Fang, Zhaoyu [1 ]
Zheng, Ruiqing [1 ]
Li, Min [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
HETEROGENEITY; MODEL;
D O I
10.1093/bioinformatics/btae020
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Single-cell RNA sequencing has emerged as a powerful technology for studying gene expression at the individual cell level. Clustering individual cells into distinct subpopulations is fundamental in scRNA-seq data analysis, facilitating the identification of cell types and exploration of cellular heterogeneity. Despite the recent development of many deep learning-based single-cell clustering methods, few have effectively exploited the correlations among genes, resulting in suboptimal clustering outcomes.Results Here, we propose a novel masked autoencoder-based method, scMAE, for cell clustering. scMAE perturbs gene expression and employs a masked autoencoder to reconstruct the original data, learning robust and informative cell representations. The masked autoencoder introduces a masking predictor, which captures relationships among genes by predicting whether gene expression values are masked. By integrating this masking mechanism, scMAE effectively captures latent structures and dependencies in the data, enhancing clustering performance. We conducted extensive comparative experiments using various clustering evaluation metrics on 15 scRNA-seq datasets from different sequencing platforms. Experimental results indicate that scMAE outperforms other state-of-the-art methods on these datasets. In addition, scMAE accurately identifies rare cell types, which are challenging to detect due to their low abundance. Furthermore, biological analyses confirm the biological significance of the identified cell subpopulations.Availability and implementation The source code of scMAE is available at: https://zenodo.org/records/10465991.
引用
收藏
页数:10
相关论文
共 56 条
[1]   Differentiation dynamics of mammary epithelial cells revealed by single-cell RNA sequencing [J].
Bach, Karsten ;
Pensa, Sara ;
Grzelak, Marta ;
Hadfield, James ;
Adams, David J. ;
Marioni, John C. ;
Khaled, Walid T. .
NATURE COMMUNICATIONS, 2017, 8
[2]   A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure [J].
Baron, Maayan ;
Veres, Adrian ;
Wolock, Samuel L. ;
Faust, Aubrey L. ;
Gaujoux, Renaud ;
Vetere, Amedeo ;
Ryu, Jennifer Hyoje ;
Wagner, Bridget K. ;
Shen-Orr, Shai S. ;
Klein, Allon M. ;
Melton, Douglas A. ;
Yanai, Itai .
CELL SYSTEMS, 2016, 3 (04) :346-+
[3]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[4]  
Botta S, 2016, ELIFE, V5, DOI [10.7554/eLife/12242, 10.7554/eLife.12242]
[5]   Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[6]   Comprehensive single-cell transcriptional profiling of a multicellular organism [J].
Cao, Junyue ;
Packer, Jonathan S. ;
Ramani, Vijay ;
Cusanovich, Darren A. ;
Huynh, Chau ;
Daza, Riza ;
Qiu, Xiaojie ;
Lee, Choli ;
Furlan, Scott N. ;
Steemers, Frank J. ;
Adey, Andrew ;
Waterston, Robert H. ;
Trapnell, Cole ;
Shendure, Jay .
SCIENCE, 2017, 357 (6352) :661-667
[7]   Deep soft K-means clustering with self-training for single-cell RNA sequence data [J].
Chen, Liang ;
Wang, Weinan ;
Zhai, Yuyao ;
Deng, Minghua .
NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (02)
[8]  
Chen T., 2020, INT C MACHINE LEARNI, P1597
[9]   GNN-based embedding for clustering scRNA-seq data [J].
Ciortan, Madalina ;
Defrance, Matthieu .
BIOINFORMATICS, 2022, 38 (04) :1037-1044
[10]   Contrastive self-supervised clustering of scRNA-seq data [J].
Ciortan, Madalina ;
Defrance, Matthieu .
BMC BIOINFORMATICS, 2021, 22 (01)