Cell Subtype Classification via Representation Learning Based on a Denoising Autoencoder for Single-Cell RNA Sequencing

被引:6
作者
Choi, Joungmin [1 ]
Rhee, Je-Keun [2 ]
Chae, Heejoon [1 ]
机构
[1] Sookmyung Womens Univ, Div Comp Sci, Seoul 04310, South Korea
[2] Soongsil Univ, Sch Syst Biomed Sci, Seoul 06978, South Korea
基金
新加坡国家研究基金会;
关键词
Feature extraction; Gene expression; Biology; Data models; Biological system modeling; RNA; Neural networks; Cell subtype; classification; gene expression; scRNA-seq; single-cell; SEQ DATA; GENE-EXPRESSION; TUMOR; TRANSCRIPTOMES; HETEROGENEITY; HEALTH;
D O I
10.1109/ACCESS.2021.3052923
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Identification of single-cell subtypes is one of the fundamental processes required to understand a heterogeneous population composed of multiple cells, based on single-cell RNA sequencing data. Previously, cell subtype identification was mainly carried out by dimension reduction and clustering approaches that grouped cells with similar expressed profiles together. However, for high robustness to noises and systematic annotation of the subtype in each cell, supervised classification approaches have been widely used. Recently, deep neural network (DNN) models have been widely presented in various fields, including biology. By capturing the composite relationship between sample features and target outcomes, a DNN model enables significant performance improvements in biological data mining analyses. In this paper, we constructed a DNN model, called scDAE for single-cell subtype identification combined with representative feature extraction using a multilayer denoising autoencoder (DAE). The feature sets were learned by the DAE and were further tuned by fully connected layers using a softmax classifier. The model was compared against four state-of-the-art cell subtype identification methods and two conventional machine learning algorithms. From multiple tests, scDAE significantly outperformed competing methods especially on data sets having a large number of cell subtypes and noises. Extracted cell features from the proposed model were clearly clustered with respect to subtype. The results of the experiments indicated that our proposed model is effective in identifying single-cell subtypes and molecular signatures representative of each cell subtype. scDAE is publicly available at https://github.com/cbi-bioinfo/scDAE.
引用
收藏
页码:14540 / 14548
页数:9
相关论文
共 58 条
[1]  
Abadi M., 2016, ARXIV
[2]   scPred: accurate supervised method for cell-type classification from single-cell RNA-seq data [J].
Alquicira-Hernandez, Jose ;
Sathe, Anuja ;
Ji, Hanlee P. ;
Quan Nguyen ;
Powell, Joseph E. .
GENOME BIOLOGY, 2019, 20 (01)
[3]   xCell: digitally portraying the tissue cellular heterogeneity landscape [J].
Aran, Dvir ;
Hu, Zicheng ;
Butte, Atul J. .
GENOME BIOLOGY, 2017, 18
[4]   A Single-Cell Transcriptomic Map of the Human and Mouse Pancreas Reveals Inter- and Intra-cell Population Structure [J].
Baron, Maayan ;
Veres, Adrian ;
Wolock, Samuel L. ;
Faust, Aubrey L. ;
Gaujoux, Renaud ;
Vetere, Amedeo ;
Ryu, Jennifer Hyoje ;
Wagner, Bridget K. ;
Shen-Orr, Shai S. ;
Klein, Allon M. ;
Melton, Douglas A. ;
Yanai, Itai .
CELL SYSTEMS, 2016, 3 (04) :346-+
[5]   ArrayExpress - a public repository for microarray gene expression data at the EBI [J].
Brazma, A ;
Parkinson, H ;
Sarkans, U ;
Shojatalab, M ;
Vilo, J ;
Abeygunawardena, N ;
Holloway, E ;
Kapushesky, M ;
Kemmeren, P ;
Lara, GG ;
Oezcimen, A ;
Rocca-Serra, P ;
Sansone, SA .
NUCLEIC ACIDS RESEARCH, 2003, 31 (01) :68-71
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   Single-cell RNA-seq enables comprehensive tumour and immune cell profiling in primary breast cancer [J].
Chung, Woosung ;
Eum, Hye Hyeon ;
Lee, Hae-Ock ;
Lee, Kyung-Min ;
Lee, Han-Byoel ;
Kim, Kyu-Tae ;
Ryu, Han Suk ;
Kim, Sangmin ;
Lee, Jeong Eon ;
Park, Yeon Hee ;
Kan, Zhengyan ;
Han, Wonshik ;
Park, Woong-Yang .
NATURE COMMUNICATIONS, 2017, 8
[8]  
Clevert D.-A., 2015, P INT C LEARN REPR S
[9]   CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing [J].
de Kanter, Jurrian K. ;
Lijnzaad, Philip ;
Candelli, Tito ;
Margaritis, Thanasis ;
Holstege, Frank C. P. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (16)
[10]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210