Interpretable generative deep learning: an illustration with single cell gene expression data

被引:8
作者
Treppner, Martin [1 ,2 ]
Binder, Harald [3 ]
Hess, Moritz [3 ]
机构
[1] Univ Freiburg, Fac Med, Inst Med Biometry & Stat, Stefan Meier Str 26, D-79104 Freiburg, Germany
[2] Univ Freiburg, Med Ctr, Stefan Meier Str 26, D-79104 Freiburg, Germany
[3] Univ Freiburg, Freiburg Ctr Data Anal & Modeling, D-79104 Freiburg, Germany
关键词
Explainable AI; Deep learning; Generative model; Dimension reduction; NEURAL-NETWORKS;
D O I
10.1007/s00439-021-02417-6
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Deep generative models can learn the underlying structure, such as pathways or gene programs, from omics data. We provide an introduction as well as an overview of such techniques, specifically illustrating their use with single-cell gene expression data. For example, the low dimensional latent representations offered by various approaches, such as variational auto-encoders, are useful to get a better understanding of the relations between observed gene expressions and experimental factors or phenotypes. Furthermore, by providing a generative model for the latent and observed variables, deep generative models can generate synthetic observations, which allow us to assess the uncertainty in the learned representations. While deep generative models are useful to learn the structure of high-dimensional omics data by efficiently capturing non-linear dependencies between genes, they are sometimes difficult to interpret due to their neural network building blocks. More precisely, to understand the relationship between learned latent variables and observed variables, e.g., gene transcript abundances and external phenotypes, is difficult. Therefore, we also illustrate current approaches that allow us to infer the relationship between learned latent variables and observed variables as well as external phenotypes. Thereby, we render deep learning approaches more interpretable. In an application with single-cell gene expression data, we demonstrate the utility of the discussed methods.
引用
收藏
页码:1481 / 1498
页数:18
相关论文
共 89 条
  • [1] Single-cell RNA-seq reveals ectopic and aberrant lung-resident cell populations in idiopathic pulmonary fibrosis
    Adams, Taylor S.
    Schupp, Jonas C.
    Poli, Sergio
    Ayaub, Ehab A.
    Neumark, Nir
    Ahangari, Farida
    Chu, Sarah G.
    Raby, Benjamin A.
    DeTullis, Giuseppe
    Januszyk, Michael
    Duan, Qiaonan
    Arnett, Heather A.
    Siddiqui, Asim
    Washko, George R.
    Homer, Robert
    Yan, Xiting
    Rosas, Ivan O.
    Kaminski, Naftali
    [J]. SCIENCE ADVANCES, 2020, 6 (28)
  • [2] Arjovsky M., 2017, ARXIV170107875
  • [3] Opening the Black Box: Interpretable Machine Learning for Geneticists
    Azodi, Christina B.
    Tang, Jiliang
    Shiu, Shin-Han
    [J]. TRENDS IN GENETICS, 2020, 36 (06) : 442 - 455
  • [4] On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation
    Bach, Sebastian
    Binder, Alexander
    Montavon, Gregoire
    Klauschen, Frederick
    Mueller, Klaus-Robert
    Samek, Wojciech
    [J]. PLOS ONE, 2015, 10 (07):
  • [5] Can Deep Learning Improve Genomic Prediction of Complex Human Traits?
    Bellot, Pau
    de los Campos, Gustavo
    Perez-Enciso, Miguel
    [J]. GENETICS, 2018, 210 (03) : 809 - 819
  • [6] Variational Inference: A Review for Statisticians
    Blei, David M.
    Kucukelbir, Alp
    McAuliffe, Jon D.
    [J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) : 859 - 877
  • [7] Boyeau P., 2019, bioRxiv, P794289
  • [8] Bayesian inference of gene expression states from single-cell RNA-seq data
    Breda, Jeremie
    Zavolan, Mihaela
    van Nimwegen, Erik
    [J]. NATURE BIOTECHNOLOGY, 2021, 39 (08) : 1008 - +
  • [9] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [10] Metagenes and molecular pattern discovery using matrix factorization
    Brunet, JP
    Tamayo, P
    Golub, TR
    Mesirov, JP
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) : 4164 - 4169