DeepGSEA: explainable deep gene set enrichment analysis for single-cell transcriptomic data

被引:1
作者
Xiong, Guangzhi [1 ]
Leroy, Nathan J. [2 ]
Bekiranov, Stefan [3 ]
Sheffield, Nathan C. [2 ]
Zhang, Aidong [1 ]
机构
[1] Univ Virginia, Dept Comp Sci, 85 Engineers Way, Charlottesville, VA 22904 USA
[2] Univ Virginia, Ctr Publ Hlth Genom, Charlottesville, VA 22904 USA
[3] Univ Virginia, Dept Biochem & Mol Genet, Charlottesville, VA 22908 USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
REVEALS; WHETHER; CA1;
D O I
10.1093/bioinformatics/btae434
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation Gene set enrichment (GSE) analysis allows for an interpretation of gene expression through pre-defined gene set databases and is a critical step in understanding different phenotypes. With the rapid development of single-cell RNA sequencing (scRNA-seq) technology, GSE analysis can be performed on fine-grained gene expression data to gain a nuanced understanding of phenotypes of interest. However, with the cellular heterogeneity in single-cell gene profiles, current statistical GSE analysis methods sometimes fail to identify enriched gene sets. Meanwhile, deep learning has gained traction in applications like clustering and trajectory inference in single-cell studies due to its prowess in capturing complex data patterns. However, its use in GSE analysis remains limited, due to interpretability challenges.Results In this paper, we present DeepGSEA, an explainable deep gene set enrichment analysis approach which leverages the expressiveness of interpretable, prototype-based neural networks to provide an in-depth analysis of GSE. DeepGSEA learns the ability to capture GSE information through our designed classification tasks, and significance tests can be performed on each gene set, enabling the identification of enriched sets. The underlying distribution of a gene set learned by DeepGSEA can be explicitly visualized using the encoded cell and cellular prototype embeddings. We demonstrate the performance of DeepGSEA over commonly used GSE analysis methods by examining their sensitivity and specificity with four simulation studies. In addition, we test our model on three real scRNA-seq datasets and illustrate the interpretability of DeepGSEA by showing how its results can be explained.Availability and implementation https://github.com/Teddy-XiongGZ/DeepGSEA
引用
收藏
页数:10
相关论文
共 34 条
  • [1] Aibar S, 2017, NAT METHODS, V14, P1083, DOI [10.1038/NMETH.4463, 10.1038/nmeth.4463]
  • [2] Systematic RNA interference reveals that oncogenic KRAS-driven cancers require TBK1
    Barbie, David A.
    Tamayo, Pablo
    Boehm, Jesse S.
    Kim, So Young
    Moody, Susan E.
    Dunn, Ian F.
    Schinzel, Anna C.
    Sandy, Peter
    Meylan, Etienne
    Scholl, Claudia
    Froehling, Stefan
    Chan, Edmond M.
    Sos, Martin L.
    Michel, Kathrin
    Mermel, Craig
    Silver, Serena J.
    Weir, Barbara A.
    Reiling, Jan H.
    Sheng, Qing
    Gupta, Piyush B.
    Wadlow, Raymond C.
    Le, Hanh
    Hoersch, Sebastian
    Wittner, Ben S.
    Ramaswamy, Sridhar
    Livingston, David M.
    Sabatini, David M.
    Meyerson, Matthew
    Thomas, Roman K.
    Lander, Eric S.
    Mesirov, Jill P.
    Root, David E.
    Gilliland, D. Gary
    Jacks, Tyler
    Hahn, William C.
    [J]. NATURE, 2009, 462 (7269) : 108 - U122
  • [3] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [4] Systematic single-cell pathway analysis to characterize early T cell activation
    Bibby, Jack A.
    Agarwal, Divyansh
    Freiwald, Tilo
    Kunz, Natalia
    Merle, Nicolas S.
    West, Erin E.
    Singh, Parul
    Larochelle, Andre
    Chinian, Fariba
    Mukherjee, Somabha
    Afzali, Behdad
    Kemper, Claudia
    Zhang, Nancy R.
    [J]. CELL REPORTS, 2022, 41 (08):
  • [5] Cao K., 2021, 9 INT C LEARN REPR I
  • [6] Functional interpretation of single cell similarity maps
    De Tomaso, David
    Jones, Matthew G.
    Subramaniam, Meena
    Ashuach, Tal
    Ye, Chun J.
    Yosef, Nir
    [J]. NATURE COMMUNICATIONS, 2019, 10 (1)
  • [7] GSEApy: a comprehensive package for performing gene set enrichment analysis in Python']Python
    Fang, Zhuoqing
    Liu, Xinyuan
    Peltz, Gary
    [J]. BIOINFORMATICS, 2023, 39 (01)
  • [8] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [9] Fisher R.A., 1970, Breakthroughs in Statistics: Methodology and Distribution, P66, DOI DOI 10.1007/978-1-4612-4380-96
  • [10] Single-cell gene set enrichment analysis and transfer learning for functional annotation of scRNA-seq data
    Franchini, Melania
    Pellecchia, Simona
    Viscido, Gaetano
    Gambardella, Gennaro
    [J]. NAR GENOMICS AND BIOINFORMATICS, 2023, 5 (01)