Accurate feature selection improves single-cell RNA-seq cell clustering

被引:40
作者
Su, Kenong [1 ]
Yu, Tianwei [2 ]
Wu, Hao [3 ]
机构
[1] Emory Univ, Dept Comp Sci, Atlanta, GA 30322 USA
[2] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen, Peoples R China
[3] Emory Univ, Dept Biostat & Bioinformat, Atlanta, GA 30322 USA
关键词
single-cell RNA sequencing; cell clustering; feature selection; NORMALIZATION;
D O I
10.1093/bib/bbab034
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Cell clustering is one of the most important and commonly performed tasks in single-cell RNA sequencing (scRNA-seq) data analysis. An important step in cell clustering is to select a subset of genes (referred to as 'features'), whose expression patterns will then be used for downstream clustering. A good set of features should include the ones that distinguish different cell types, and the quality of such set could have a significant impact on the clustering accuracy. All existing scRNA-seq clustering tools include a feature selection step relying on some simple unsupervised feature selection methods, mostly based on the statistical moments of gene-wise expression distributions. In this work, we carefully evaluate the impact of feature selection on cell clustering accuracy. In addition, we develop a feature selection algorithm named FEAture SelecTion (FEAST), which provides more representative features. We apply the method on 12 public scRNA-seq datasets and demonstrate that using features selected by FEAST with existing clustering tools significantly improve the clustering accuracy.
引用
收藏
页数:10
相关论文
共 51 条
[1]  
[Anonymous], Introduction to Information Retrieval by
[2]  
[Anonymous], 2012, CoRR. abs/1202.3725
[3]   SCnorm: robust normalization of single-cell RNA-seq data [J].
Bacher, Rhonda ;
Chu, Li-Fang ;
Leng, Ning ;
Gasch, Audrey P. ;
Thomson, James A. ;
Stewart, Ron M. ;
Newton, Michael ;
Kendziorski, Christina .
NATURE METHODS, 2017, 14 (06) :584-+
[4]   NCBI GEO: archive for functional genomics data sets-update [J].
Barrett, Tanya ;
Wilhite, Stephen E. ;
Ledoux, Pierre ;
Evangelista, Carlos ;
Kim, Irene F. ;
Tomashevsky, Maxim ;
Marshall, Kimberly A. ;
Phillippy, Katherine H. ;
Sherman, Patti M. ;
Holko, Michelle ;
Yefanov, Andrey ;
Lee, Hyeseung ;
Zhang, Naigong ;
Robertson, Cynthia L. ;
Serova, Nadezhda ;
Davis, Sean ;
Soboleva, Alexandra .
NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) :D991-D995
[5]   Information distance [J].
Bennett, CH ;
Gacs, P ;
Li, M ;
Vitanyi, FMB ;
Zurek, WH .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (04) :1407-1423
[6]   The single-cell transcriptional landscape of mammalian organogenesis [J].
Cao, Junyue ;
Spielmann, Malte ;
Qiu, Xiaojie ;
Huang, Xingfan ;
Ibrahim, Daniel M. ;
Hill, Andrew J. ;
Zhang, Fan ;
Mundlos, Stefan ;
Christiansen, Lena ;
Steemers, Frank J. ;
Trapnell, Cole ;
Shendure, Jay .
NATURE, 2019, 566 (7745) :496-+
[7]   Performance Assessment and Selection of Normalization Procedures for Single-Cell RNA-Seq [J].
Cole, Michael B. ;
Risso, Davide ;
Wagner, Allon ;
DeTomaso, David ;
Ngai, John ;
Purdom, Elizabeth ;
Dudoit, Sandrine ;
Yosef, Nir .
CELL SYSTEMS, 2019, 8 (04) :315-+
[8]   CHETAH: a selective, hierarchical cell type identification method for single-cell RNA sequencing [J].
de Kanter, Jurrian K. ;
Lijnzaad, Philip ;
Candelli, Tito ;
Margaritis, Thanasis ;
Holstege, Frank C. P. .
NUCLEIC ACIDS RESEARCH, 2019, 47 (16)
[9]  
Duo A, 2018, SYSTEMATIC PERFORMAN
[10]   Spatial transcriptomic survey of human embryonic cerebral cortex by single-cell RNA-seq analysis [J].
Fan, Xiaoying ;
Dong, Ji ;
Zhong, Suijuan ;
Wei, Yuan ;
Wu, Qian ;
Yan, Liying ;
Yong, Jun ;
Sun, Le ;
Wang, Xiaoye ;
Zhao, Yangyu ;
Wang, Wei ;
Yan, Jie ;
Wang, Xiaoqun ;
Qiao, Jie ;
Tang, Fuchou .
CELL RESEARCH, 2018, 28 (07) :730-745