A Novel Single-Cell RNA Sequencing Data Feature Extraction Method Based on Gene Function Analysis and Its Applications in Glioma Study

被引:3
|
作者
Zhuang, Jujuan [1 ]
Ren, Changjing [1 ]
Ren, Dan [2 ]
Li, Yu'ang [3 ]
Liu, Danyang [1 ]
Cui, Lingyu [1 ]
Tian, Geng [4 ]
Yang, Jiasheng [5 ]
Liu, Jingbo [2 ]
机构
[1] Dalian Maritime Univ, Sch Sci, Dalian, Liaoning, Peoples R China
[2] Qiqihar Med Univ, Da Qing Long Nan Hosp, Pathol Dept, Qiqihar, Heilongjiang, Peoples R China
[3] Univ Nottingham, Maths & Appl Math, Nottingham, England
[4] Geneis Beijing Co Ltd, Beijing, Peoples R China
[5] Anhui Univ Technol, Sch Elect & Informat Engn, Hefei, Anhui, Peoples R China
来源
FRONTIERS IN ONCOLOGY | 2021年 / 11卷
基金
中国国家自然科学基金;
关键词
single-cell RNA sequencing; GO enrichment analysis; KPCA; semantic similarity analysis; Gene Ontology; SEQ DATA; HETEROGENEITY; DIVERSITY; FATE; TOOL;
D O I
10.3389/fonc.2021.797057
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Critical in revealing cell heterogeneity and identifying new cell subtypes, cell clustering based on single-cell RNA sequencing (scRNA-seq) is challenging. Due to the high noise, sparsity, and poor annotation of scRNA-seq data, existing state-of-the-art cell clustering methods usually ignore gene functions and gene interactions. In this study, we propose a feature extraction method, named FEGFS, to analyze scRNA-seq data, taking advantage of known gene functions. Specifically, we first derive the functional gene sets based on Gene Ontology (GO) terms and reduce their redundancy by semantic similarity analysis and gene repetitive rate reduction. Then, we apply the kernel principal component analysis to select features on each non-redundant functional gene set, and we combine the selected features (for each functional gene set) together for subsequent clustering analysis. To test the performance of FEGFS, we apply agglomerative hierarchical clustering based on FEGFS and compared it with seven state-of-the-art clustering methods on six real scRNA-seq datasets. For small datasets like Pollen and Goolam, FEGFS outperforms all methods on all four evaluation metrics including adjusted Rand index (ARI), normalized mutual information (NMI), homogeneity score (HOM), and completeness score (COM). For example, the ARIs of FEGFS are 0.955 and 0.910, respectively, on Pollen and Goolam; and those of the second-best method are only 0.938 and 0.910, respectively. For large datasets, FEGFS also outperforms most methods. For example, the ARIs of FEGFS are 0.781 on both Klein and Zeisel, which are higher than those of all other methods but slight lower than those of SC3 (0.798 and 0.807, respectively). Moreover, we demonstrate that CMF-Impute is powerful in reconstructing cell-to-cell and gene-to-gene correlation and in inferring cell lineage trajectories. As for application, take glioma as an example; we demonstrated that our clustering methods could identify important cell clusters related to glioma and also inferred key marker genes related to these cell clusters.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] A Novel Algorithm for Feature Selection Using Penalized Regression with Applications to Single-Cell RNA Sequencing Data
    Sen Puliparambil, Bhavithry
    Tomal, Jabed H.
    Yan, Yan
    BIOLOGY-BASEL, 2022, 11 (10):
  • [2] Analysis of single-cell RNA sequencing data based on autoencoders
    Andrea Tangherloni
    Federico Ricciuti
    Daniela Besozzi
    Pietro Liò
    Ana Cvejic
    BMC Bioinformatics, 22
  • [3] Analysis of single-cell RNA sequencing data based on autoencoders
    Tangherloni, Andrea
    Ricciuti, Federico
    Besozzi, Daniela
    Lio, Pietro
    Cvejic, Ana
    BMC BIOINFORMATICS, 2021, 22 (01)
  • [4] Single-Cell RNA Sequencing and Its Applications in the Study of Psychiatric Disorders
    Antunes, Andre S. L. M.
    Martins-de-Souza, Daniel
    BIOLOGICAL PSYCHIATRY: GLOBAL OPEN SCIENCE, 2023, 3 (03): : 329 - 339
  • [5] Differential gene expression analysis in single-cell RNA sequencing data
    Wang, Tianyu
    Nabavi, Sheida
    2017 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2017, : 202 - 207
  • [6] SigEMD: A powerful method for differential gene expression analysis in single-cell RNA sequencing data
    Wang, Tianyu
    Nabavi, Sheida
    METHODS, 2018, 145 : 25 - 32
  • [7] Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis
    Hao Huang
    Chunlei Liu
    Manoj M. Wagle
    Pengyi Yang
    Genome Biology, 24
  • [8] Evaluation of deep learning-based feature selection for single-cell RNA sequencing data analysis
    Huang, Hao
    Liu, Chunlei
    Wagle, Manoj M.
    Yang, Pengyi
    GENOME BIOLOGY, 2023, 24 (01)
  • [9] Complex Analysis of Single-Cell RNA Sequencing Data
    Khozyainova, Anna A. A.
    Valyaeva, Anna A. A.
    Arbatsky, Mikhail S. S.
    Isaev, Sergey V. V.
    Iamshchikov, Pavel S. S.
    Volchkov, Egor V. V.
    Sabirov, Marat S. S.
    Zainullina, Viktoria R. R.
    Chechekhin, Vadim I. I.
    Vorobev, Rostislav S. S.
    Menyailo, Maxim E. E.
    Tyurin-Kuzmin, Pyotr A. A.
    Denisov, Evgeny V. V.
    BIOCHEMISTRY-MOSCOW, 2023, 88 (02) : 231 - 252
  • [10] Complex Analysis of Single-Cell RNA Sequencing Data
    Anna A. Khozyainova
    Anna A. Valyaeva
    Mikhail S. Arbatsky
    Sergey V. Isaev
    Pavel S. Iamshchikov
    Egor V. Volchkov
    Marat S. Sabirov
    Viktoria R. Zainullina
    Vadim I. Chechekhin
    Rostislav S. Vorobev
    Maxim E. Menyailo
    Pyotr A. Tyurin-Kuzmin
    Evgeny V. Denisov
    Biochemistry (Moscow), 2023, 88 : 231 - 252