scMAE: a masked autoencoder for single-cell RNA-seq clustering

被引:7
|
作者
Fang, Zhaoyu [1 ]
Zheng, Ruiqing [1 ]
Li, Min [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, 932 South Lushan Rd, Changsha 410083, Peoples R China
基金
中国国家自然科学基金;
关键词
HETEROGENEITY; MODEL;
D O I
10.1093/bioinformatics/btae020
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Single-cell RNA sequencing has emerged as a powerful technology for studying gene expression at the individual cell level. Clustering individual cells into distinct subpopulations is fundamental in scRNA-seq data analysis, facilitating the identification of cell types and exploration of cellular heterogeneity. Despite the recent development of many deep learning-based single-cell clustering methods, few have effectively exploited the correlations among genes, resulting in suboptimal clustering outcomes.Results Here, we propose a novel masked autoencoder-based method, scMAE, for cell clustering. scMAE perturbs gene expression and employs a masked autoencoder to reconstruct the original data, learning robust and informative cell representations. The masked autoencoder introduces a masking predictor, which captures relationships among genes by predicting whether gene expression values are masked. By integrating this masking mechanism, scMAE effectively captures latent structures and dependencies in the data, enhancing clustering performance. We conducted extensive comparative experiments using various clustering evaluation metrics on 15 scRNA-seq datasets from different sequencing platforms. Experimental results indicate that scMAE outperforms other state-of-the-art methods on these datasets. In addition, scMAE accurately identifies rare cell types, which are challenging to detect due to their low abundance. Furthermore, biological analyses confirm the biological significance of the identified cell subpopulations.Availability and implementation The source code of scMAE is available at: https://zenodo.org/records/10465991.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] HNC: a hybrid neighbourhood-consensus clustering algorithm for single-cell RNA-seq data
    Das, Priyojit
    Saha, Sujay
    INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2021, 25 (3-4) : 161 - 180
  • [32] Joint dimension reduction and clustering analysis of single-cell RNA-seq and spatial transcriptomics data
    Liu, Wei
    Liao, Xu
    Yang, Yi
    Lin, Huazhen
    Yeong, Joe
    Zhou, Xiang
    Shi, Xingjie
    Liu, Jin
    NUCLEIC ACIDS RESEARCH, 2022,
  • [33] Fast and accurate single-cell RNA-seq analysis by clustering of transcript-compatibility counts
    Ntranos, Vasilis
    Kamath, Govinda M.
    Zhang, Jesse M.
    Pachter, Lior
    Tse, David N.
    GENOME BIOLOGY, 2016, 17
  • [34] scASGC: An adaptive simplified graph convolution model for clustering single-cell RNA-seq data
    Wang, Shudong
    Zhang, Yu
    Zhang, Yulin
    Wu, Wenhao
    Ye, Lan
    Li, Yunyin
    Su, Jionglong
    Pang, Shanchen
    COMPUTERS IN BIOLOGY AND MEDICINE, 2023, 163
  • [35] ScGSLC: An unsupervised graph similarity learning framework for single-cell RNA-seq data clustering
    Li, Junyi
    Jiang, Wei
    Han, Henry
    Liu, Jing
    Liu, Bo
    Wang, Yadong
    COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2021, 90
  • [36] Enhancing Clustering of single-cell RNA-seq data by Proximity Learning on Random Projected spaces
    Vrahatis, Aristidis G.
    Dimitrakopoulos, Georgios N.
    Tasoulis, Sotiris K.
    Plagianakos, Vassilis P.
    2019 IEEE 19TH INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOENGINEERING (BIBE), 2019, : 846 - 849
  • [37] Testing for Phylogenetic Signal in Single-Cell RNA-Seq Data
    Moravec, Jiri C.
    Lanfear, Robert
    Spector, David L.
    Diermeier, Sarah D.
    Gavryushkin, Alex
    JOURNAL OF COMPUTATIONAL BIOLOGY, 2023, 30 (04) : 518 - 537
  • [38] Comparative Analysis of Single-Cell RNA-seq Cluster Methods
    Fang, Jingwen
    Yin, Zhaohua
    Guo, Chuang
    2ND INTERNATIONAL CONFERENCE ON FRONTIERS OF BIOLOGICAL SCIENCES AND ENGINEERING (FSBE 2019), 2020, 2208
  • [39] SCnorm: robust normalization of single-cell RNA-seq data
    Bacher, Rhonda
    Chu, Li-Fang
    Leng, Ning
    Gasch, Audrey P.
    Thomson, James A.
    Stewart, Ron M.
    Newton, Michael
    Kendziorski, Christina
    NATURE METHODS, 2017, 14 (06) : 584 - +
  • [40] A statistical approach for identifying differential distributions in single-cell RNA-seq experiments
    Korthauer, Keegan D.
    Chu, Li-Fang
    Newton, Michael A.
    Li, Yuan
    Thomson, James
    Stewart, Ron
    Kendziorski, Christina
    GENOME BIOLOGY, 2016, 17