CellVGAE: an unsupervised scRNA-seq analysis workflow with graph attention networks

被引:27
作者
Buterez, David [1 ]
Bica, Ioana [2 ,3 ]
Tariq, Ifrah [4 ]
Andres-Terre, Helena [1 ]
Lio, Pietro [1 ]
机构
[1] Univ Cambridge, Dept Comp Sci & Technol, Cambridge CB3 0FD, England
[2] Univ Oxford, Dept Engn Sci, Oxford OX1 3PJ, England
[3] Alan Turing Inst, London NW1 2DB, England
[4] MIT, Dept Biol Engn, Computat & Syst Biol Program, Cambridge, MA 02142 USA
关键词
RNA-SEQ;
D O I
10.1093/bioinformatics/btab804
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Single-cell RNA sequencing allows high-resolution views of individual cells for libraries of up to millions of samples, thus motivating the use of deep learning for analysis. In this study, we introduce the use of graph neural networks for the unsupervised exploration of scRNA-seq data by developing a variational graph autoencoder architecture with graph attention layers that operates directly on the connectivity between cells, focusing on dimensionality reduction and clustering. With the help of several case studies, we show that our model, named CellVGAE, can be effectively used for exploratory analysis even on challenging datasets, by extracting meaningful features from the data and providing the means to visualize and interpret different aspects of the model. Results: We show that CellVGAE is more interpretable than existing scRNA-seq variational architectures by analysing the graph attention coefficients. By drawing parallels with other scRNA-seq studies on interpretability, we assess the validity of the relationships modelled by attention, and furthermore, we show that CellVGAE can intrinsically capture information such as pseudotime and NF-kappa B activation dynamics, the latter being a property that is not generally shared by existing neural alternatives. We then evaluate the dimensionality reduction and clustering performance on 9 difficult and well-annotated datasets by comparing with three leading neural and non-neural techniques, concluding that CellVGAE outperforms competing methods. Finally, we report a decrease in training times of up to x 20 on a dataset of 1.3 million cells compared to existing deep learning architectures.
引用
收藏
页码:1277 / 1286
页数:10
相关论文
共 33 条
[1]   Unsupervised generative and graph representation learning for modelling cell differentiation [J].
Bica, Ioana ;
Andres-Terre, Helena ;
Cvejic, Ana ;
Lio, Pietro .
SCIENTIFIC REPORTS, 2020, 10 (01)
[2]   Normalization of single-cell RNA-seq counts by log(x+1) or log(1+x) [J].
Booeshaghi, A. Sina ;
Pachter, Lior .
BIOINFORMATICS, 2021, 37 (15) :2223-2224
[3]  
Brody S, 2022, 10 INT C LEARN UNPUB
[4]   Single-Cell RNA-Seq Technologies and Related Computational Data Analysis [J].
Chen, Geng ;
Ning, Baitang ;
Shi, Tieliu .
FRONTIERS IN GENETICS, 2019, 10
[5]   Single-cell RNA-seq denoising using a deep count autoencoder [J].
Eraslan, Goekcen ;
Simon, Lukas M. ;
Mircea, Maria ;
Mueller, Nikola S. ;
Theis, Fabian J. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[6]   scVAE: variational auto-encoders for single-cell gene expression data [J].
Gronbech, Christopher Heje ;
Vording, Maximillian Fornitz ;
Timshel, Pascal N. ;
Sonderby, Casper Kaae ;
Pers, Tune H. ;
Winther, Ole .
BIOINFORMATICS, 2020, 36 (16) :4415-4422
[7]  
Ji ZC, 2019, METHODS MOL BIOL, V1935, P115, DOI 10.1007/978-1-4939-9057-3_8
[8]   Billion-Scale Similarity Search with GPUs [J].
Johnson, Jeff ;
Douze, Matthijs ;
Jegou, Herve .
IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (03) :535-547
[9]   Impact of similarity metrics on single-cell RNA-seq data clustering [J].
Kim, Taiyun ;
Chen, Irene Rui ;
Lin, Yingxin ;
Wang, Andy Yi-Yang ;
Yang, Jean Yee Hwa ;
Yang, Pengyi .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (06) :2316-2326
[10]  
Kingma D. P., 2014, AUTOENCODING VARIATI