scSensitiveGeneDefine: sensitive gene detection in single-cell RNA sequencing data by Shannon entropy

被引:5
作者
Chen, Zechuan [1 ,2 ]
Yang, Zeruo [3 ]
Yuan, Xiaojun [1 ]
Zhang, Xiaoming [2 ]
Hao, Pei [2 ]
机构
[1] Shanghai Univ, Coll Life Sci, Shanghai, Peoples R China
[2] Chinese Acad Sci, Inst Pasteur Shanghai, Key Lab Mol Virol & Immunol, Shanghai, Peoples R China
[3] Zhejiang YangShengTang Co Ltd, Nat Med Inst, 181 Geyazhuang, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Sensitive genes; Single-cell RNA sequencing; Stochastic gene expression; Unsupervised clustering; EXPRESSION; VARIABILITY;
D O I
10.1186/s12859-021-04136-1
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background Single-cell RNA sequencing (scRNA-seq) is the most widely used technique to obtain gene expression profiles from complex tissues. Cell subsets and developmental states are often identified via differential gene expression patterns. Most of the single-cell tools utilized highly variable genes to annotate cell subsets and states. However, we have discovered that a group of genes, which sensitively respond to environmental stimuli with high coefficients of variation (CV), might impose overwhelming influences on the cell type annotation. Result In this research, we developed a method, based on the CV-rank and Shannon entropy, to identify these noise genes, and termed them as "sensitive genes". To validate the reliability of our methods, we applied our tools in 11 single-cell data sets from different human tissues. The results showed that most of the sensitive genes were enriched pathways related to cellular stress response. Furthermore, we noticed that the unsupervised result was closer to the ground-truth cell labels, after removing the sensitive genes detected by our tools. Conclusion Our study revealed the prevalence of stochastic gene expression patterns in most types of cells, compared the differences among cell marker genes, housekeeping genes (HK genes), and sensitive genes, demonstrated the similarities of functions of sensitive genes in various scRNA-seq data sets, and improved the results of unsupervised clustering towards the ground-truth labels. We hope our method would provide new insights into the reduction of data noise in scRNA-seq data analysis and contribute to the development of better scRNA-seq unsupervised clustering algorithms in the future.
引用
收藏
页数:13
相关论文
共 32 条
  • [1] Identifying cell populations with scRNASeq
    Andrews, Tallulah S.
    Hemberg, Martin
    [J]. MOLECULAR ASPECTS OF MEDICINE, 2018, 59 : 114 - 122
  • [2] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [3] Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells
    Buettner, Florian
    Natarajan, Kedar N.
    Casale, F. Paolo
    Proserpio, Valentina
    Scialdone, Antonio
    Theis, Fabian J.
    Teichmann, Sarah A.
    Marioni, John C.
    Stegie, Oliver
    [J]. NATURE BIOTECHNOLOGY, 2015, 33 (02) : 155 - 160
  • [4] Detection of high variability in gene expression from single-cell RNA-seq profiling
    Chen, Hung-I Harry
    Jin, Yufang
    Huang, Yufei
    Chen, Yidong
    [J]. BMC GENOMICS, 2016, 17
  • [5] Ding J., 2019, bioRxiv, DOI 10.1101/632216
  • [6] Human housekeeping genes, revisited
    Eisenberg, Eli
    Levanon, Erez Y.
    [J]. TRENDS IN GENETICS, 2013, 29 (10) : 569 - 574
  • [7] The adult human testis transcriptional cell atlas
    Guo, Jingtao
    Grow, Edward J.
    Mlcochova, Hana
    Maher, Geoffrey J.
    Lindskog, Cecilia
    Nie, Xichen
    Guo, Yixuan
    Takei, Yodai
    Yun, Jina
    Cai, Long
    Kim, Robin
    Carrell, Douglas T.
    Goriely, Anne
    Hotaling, James M.
    Cairns, Bradley R.
    [J]. CELL RESEARCH, 2018, 28 (12) : 1141 - 1157
  • [8] Global characterization of T cells in non-small-cell lung cancer by single-cell sequencing
    Guo, Xinyi
    Zhang, Yuanyuan
    Zheng, Liangtao
    Zheng, Chunhong
    Song, Jintao
    Zhang, Qiming
    Kang, Boxi
    Liu, Zhouzerui
    Jin, Liang
    Xing, Rui
    Gao, Ranran
    Zhang, Lei
    Dong, Minghui
    Hu, Xueda
    Ren, Xianwen
    Kirchhoff, Dennis
    Roider, Helge Gottfried
    Yan, Tiansheng
    Zhang, Zemin
    [J]. NATURE MEDICINE, 2018, 24 (07) : 978 - +
  • [9] Human gene expression sensitivity according to large scale meta-analysis
    Hao, Pei
    Zheng, Siyuan
    Ping, Jie
    Tu, Kang
    Gieger, Christian
    Wang-Sattler, Rui
    Zhong, Yang
    Li, Yixue
    [J]. BMC BIOINFORMATICS, 2009, 10
  • [10] A benchmark of batch-effect correction methods for single-cell RNA sequencing data
    Hoa Thi Nhu Tran
    Ang, Kok Siong
    Chevrier, Marion
    Zhang, Xiaomeng
    Lee, Nicole Yee Shin
    Goh, Michelle
    Chen, Jinmiao
    [J]. GENOME BIOLOGY, 2020, 21 (01)