Bayesian compositional generalized linear models for analyzing microbiome data

被引:2
|
作者
Zhang, Li [1 ]
Zhang, Xinyan [2 ]
Yi, Nengjun [1 ]
机构
[1] Univ Alabama Birmingham, Dept Biostat, Birmingham, AL 35294 USA
[2] Kennesaw State Univ, Sch Data Sci & Analyt, Kennesaw, GA USA
关键词
Bayesian models; compositional data; MCMC; microbiome; sum-to-zero restriction; STATISTICAL-ANALYSIS; GUT MICROBIOTA; REGRESSION;
D O I
10.1002/sim.9946
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The crucial impact of the microbiome on human health and disease has gained significant scientific attention. Researchers seek to connect microbiome features with health conditions, aiming to predict diseases and develop personalized medicine strategies. However, the practicality of conventional models is restricted due to important aspects of microbiome data. Specifically, the data observed is compositional, as the counts within each sample are bound by a fixed-sum constraint. Moreover, microbiome data often exhibits high dimensionality, wherein the number of variables surpasses the available samples. In addition, microbiome features exhibiting phenotypical similarity usually have similar influence on the response variable. To address the challenges posed by these aspects of the data structure, we proposed Bayesian compositional generalized linear models for analyzing microbiome data (BCGLM) with a structured regularized horseshoe prior for the compositional coefficients and a soft sum-to-zero restriction on coefficients through the prior distribution. We fitted the proposed models using Markov Chain Monte Carlo (MCMC) algorithms with R package rstan. The performance of the proposed method was assessed by extensive simulation studies. The simulation results show that our approach outperforms existing methods with higher accuracy of coefficient estimates and lower prediction error. We also applied the proposed method to microbiome study to find microorganisms linked to inflammatory bowel disease (IBD). To make this work reproducible, the code and data used in this article are available at .
引用
收藏
页码:141 / 155
页数:15
相关论文
共 50 条
  • [31] Generalized fused Lasso for grouped data in generalized linear models
    Ohishi, Mineaki
    STATISTICS AND COMPUTING, 2024, 34 (04)
  • [32] Partial linear regression of compositional data
    Han, Hyebin
    Yu, Kyusang
    JOURNAL OF THE KOREAN STATISTICAL SOCIETY, 2022, 51 (04) : 1090 - 1116
  • [33] Bayesian estimation and influence diagnostics of generalized partially linear mixed-effects models for longitudinal data
    Duan, Xing-De
    Tang, Nian-Sheng
    STATISTICS, 2016, 50 (03) : 525 - 539
  • [34] REGRESSION ANALYSIS FOR MICROBIOME COMPOSITIONAL DATA
    Shi, Pixu
    Zhang, Anru
    Li, Hongzhe
    ANNALS OF APPLIED STATISTICS, 2016, 10 (02) : 1019 - 1040
  • [35] Predictive Modeling of Microbiome Data Using a Phylogeny-Regularized Generalized Linear Mixed Model
    Xiao, Jian
    Chen, Li
    Johnson, Stephen
    Yu, Yue
    Zhang, Xianyang
    Chen, Jun
    FRONTIERS IN MICROBIOLOGY, 2018, 9
  • [36] Bayesian Subset Modeling for High-Dimensional Generalized Linear Models
    Liang, Faming
    Song, Qifan
    Yu, Kai
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2013, 108 (502) : 589 - 606
  • [37] Compositional zero-inflated network estimation for microbiome data
    Ha, Min Jin
    Kim, Junghi
    Galloway-Pena, Jessica
    Kim-Anh Do
    Peterson, Christine B.
    BMC BIOINFORMATICS, 2020, 21 (Suppl 21)
  • [38] Compositional zero-inflated network estimation for microbiome data
    Min Jin Ha
    Junghi Kim
    Jessica Galloway-Peña
    Kim-Anh Do
    Christine B. Peterson
    BMC Bioinformatics, 21
  • [39] Zero-inflated generalized Dirichlet multinomial regression model for microbiome compositional data analysis
    Tang, Zheng-Zheng
    Chen, Guanhua
    BIOSTATISTICS, 2019, 20 (04) : 698 - 713
  • [40] Flexibility of Bayesian generalized linear mixed models for oral health research
    Berchialla, Paola
    Baldi, Ileana
    Notaro, Vincenzo
    Barone-Monfrin, Sandro
    Bassi, Francesco
    Gregori, Dario
    STATISTICS IN MEDICINE, 2009, 28 (28) : 3509 - 3522