A Bayesian multivariate mixture model for high throughput spatial transcriptomics

被引:5
作者
Allen, Carter [1 ,4 ]
Chang, Yuzhou [1 ,4 ]
Neelon, Brian [2 ]
Chang, Won [3 ]
Kim, Hang J. [3 ]
Li, Zihai [4 ]
Ma, Qin [1 ,4 ]
Chung, Dongjun [1 ,4 ]
机构
[1] Ohio State Univ, Dept Biomed Informat, Columbus, OH 43210 USA
[2] Med Univ South Carolina, Dept Publ Hlth Sci, Charleston, SC 29425 USA
[3] Univ Cincinnati, Div Stat & Data Sci, Cincinnati, OH USA
[4] Ohio State Univ, Comprehens Canc Ctr, Pelotonia Inst Immunooncol, Columbus, OH 43210 USA
关键词
Bayesian models; conditionally autoregressive models; mixture models; skew-normal; spatial transcriptomics; SINGLE-CELL; INFERENCE;
D O I
10.1111/biom.13727
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
High throughput spatial transcriptomics (HST) is a rapidly emerging class of experimental technologies that allow for profiling gene expression in tissue samples at or near single-cell resolution while retaining the spatial location of each sequencing unit within the tissue sample. Through analyzing HST data, we seek to identify sub-populations of cells within a tissue sample that may inform biological phenomena. Existing computational methods either ignore the spatial heterogeneity in gene expression profiles, fail to account for important statistical features such as skewness, or are heuristic-based network clustering methods that lack the inferential benefits of statistical modeling. To address this gap, we develop SPRUCE: a Bayesian spatial multivariate finite mixture model based on multivariate skew-normal distributions, which is capable of identifying distinct cellular sub-populations in HST data. We further implement a novel combination of Polya-Gamma data augmentation and spatial random effects to infer spatially correlated mixture component membership probabilities without relying on approximate inference techniques. Via a simulation study, we demonstrate the detrimental inferential effects of ignoring skewness or spatial correlation in HST data. Using publicly available human brain HST data, SPRUCE outperforms existing methods in recovering expertly annotated brain layers. Finally, our application of SPRUCE to human breast cancer HST data indicates that SPRUCE can distinguish distinct cell populations within the tumor microenvironment. An R package spruce for fitting the proposed models is available through The Comprehensive R Archive Network.
引用
收藏
页码:1775 / 1787
页数:13
相关论文
共 41 条
  • [1] 10X Genomics, 2019, MOUS BRAIN SER SEC 2
  • [2] 10x Genomics, 2020, MOUS BRAIN SER SEC 1, P1
  • [3] 10X Genomics, 2020, MOUS KIDN SECT COR S
  • [4] 10X Genomics, 2020, HUM BREAST CANC BLOC
  • [5] NEW LOOK AT STATISTICAL-MODEL IDENTIFICATION
    AKAIKE, H
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1974, AC19 (06) : 716 - 723
  • [6] A Bayesian multivariate mixture model for skewed longitudinal data with intermittent missing observations: An application to infant motor development
    Allen, Carter
    Benjamin-Neelon, Sara E.
    Neelon, Brian
    [J]. BIOMETRICS, 2021, 77 (02) : 675 - 688
  • [7] Ann Phoebe, 2018, Oncotarget, V9, P23114, DOI 10.18632/oncotarget.25225
  • [8] The multivariate skew-normal distribution
    Azzalini, A
    DallaValle, A
    [J]. BIOMETRIKA, 1996, 83 (04) : 715 - 726
  • [9] Combined single-cell and spatial transcriptomics reveal the molecular, cellular and spatial bone marrow niche organization
    Baccin, Chiara
    Al-Sabah, Jude
    Velten, Lars
    Helbling, Patrick M.
    Gruenschlaeger, Florian
    Hernandez-Malmierca, Pablo
    Nombela-Arrieta, Cesar
    Steinmetz, Lars M.
    Trumpp, Andreas
    Haas, Simon
    [J]. NATURE CELL BIOLOGY, 2020, 22 (01) : 38 - +
  • [10] Banerjee S., 2014, Hierarchical modeling and analysis for spatial data