Bayesian variable selection for multivariate zero-inflated models: Application to microbiome count data

被引:10
|
作者
Lee, Kyu Ha [1 ,2 ]
Coull, Brent A. [3 ]
Moscicki, Anna-Barbara [4 ]
Paster, Bruce J. [1 ,5 ]
Starr, Jacqueline R. [1 ,2 ]
机构
[1] Forsyth Inst, 245 First St, Cambridge, MA 02142 USA
[2] Harvard Sch Dent Med, Dept Oral Hlth Policy & Epidemiol, Boston, MA 02115 USA
[3] Harvard TH Chan Sch Publ Hlth, Dept Biostat, 665 Huntington Ave, Boston, MA 02115 USA
[4] Univ Calif Los Angeles, David Geffen Sch Med, Dept Pediat, Los Angeles, CA 10833 USA
[5] Harvard Sch Dent Med, Dept Oral Med Infect & Immun, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
Bayesian variable selection; Markov chain Monte Carlo; Microbiome sequencing data; Multivariate analysis; Zero-inflated models; POISSON REGRESSION; DENTAL-CARIES; PREVALENCE; HEALTH;
D O I
10.1093/biostatistics/kxy067
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microorganisms play critical roles in human health and disease. They live in diverse communities in which they interact synergistically or antagonistically. Thus for estimating microbial associations with clinical covariates, such as treatment effects, joint (multivariate) statistical models are preferred. Multivariate models allow one to estimate and exploit complex interdependencies among multiple taxa, yielding more powerful tests of exposure or treatment effects than application of taxon-specific univariate analyses. Analysis of microbial count data also requires special attention because data commonly exhibit zero inflation, i.e., more zeros than expected from a standard count distribution. To meet these needs, we developed a Bayesian variable selection model for multivariate count data with excess zeros that incorporates information on the covariance structure of the outcomes (counts for multiple taxa), while estimating associations with the mean levels of these outcomes. Though there has been much work on zero-inflated models for longitudinal data, little attention has been given to high-dimensional multivariate zero-inflated data modeled via a general correlation structure. Through simulation, we compared performance of the proposed method to that of existing univariate approaches, for both the binary ("excess zero") and count parts of the model. When outcomes were correlated the proposed variable selection method maintained type I error while boosting the ability to identify true associations in the binary component of the model. For the count part of the model, in some scenarios the univariate method had higher power than the multivariate approach. This higher power was at a cost of a highly inflated false discovery rate not observed with the proposed multivariate method. We applied the approach to oral microbiome data from the Pediatric HIV/AIDS Cohort Oral Health Study and identified five (of 44) species associated with HIV infection.
引用
收藏
页码:499 / 517
页数:19
相关论文
共 50 条
  • [1] A Bayesian nonparametric analysis for zero-inflated multivariate count data with application to microbiome study
    Shuler, Kurtis
    Verbanic, Samuel
    Chen, Irene A.
    Lee, Juhee
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2021, 70 (04) : 961 - 979
  • [2] Zero-inflated models with application to spatial count data
    Agarwal, DK
    Gelfand, AE
    Citron-Pousty, S
    ENVIRONMENTAL AND ECOLOGICAL STATISTICS, 2002, 9 (04) : 341 - 355
  • [3] Zero-inflated models with application to spatial count data
    Deepak K. Agarwal
    Alan E. Gelfand
    Steven Citron-Pousty
    Environmental and Ecological Statistics, 2002, 9 : 341 - 355
  • [4] Assessment and Selection of Competing Models for Zero-Inflated Microbiome Data
    Xu, Lizhen
    Paterson, Andrew D.
    Turpin, Williams
    Xu, Wei
    PLOS ONE, 2015, 10 (07):
  • [5] Estimation and selection for spatial zero-inflated count models
    Shen, Chung-Wei
    Chen, Chun-Shu
    ENVIRONMETRICS, 2024, 35 (04)
  • [6] Infants' gut microbiome data: A Bayesian Marginal Zero-inflated Negative Binomial regression model for multivariate analyses of count data
    Hajihosseini, Morteza
    Amini, Payam
    Saidi-Mehrabad, Alireza
    Dinu, Irina
    COMPUTATIONAL AND STRUCTURAL BIOTECHNOLOGY JOURNAL, 2023, 21 : 1621 - 1629
  • [7] Variable selection approach for zero-inflated count data via adaptive lasso
    Zeng, Ping
    Wei, Yongyue
    Zhao, Yang
    Liu, Jin
    Liu, Liya
    Zhang, Ruyang
    Gou, Jianwei
    Huang, Shuiping
    Chen, Feng
    JOURNAL OF APPLIED STATISTICS, 2014, 41 (04) : 879 - 894
  • [8] BAYESIAN MIXED EFFECTS MODELS FOR ZERO-INFLATED COMPOSITIONS IN MICROBIOME DATA ANALYSIS
    Ren, Boyu
    Bacallado, Sergio
    Favaro, Stefano
    Vatanen, Tommi
    Huttenhower, Curtis
    Trippa, Lorenzo
    ANNALS OF APPLIED STATISTICS, 2020, 14 (01): : 494 - 517
  • [9] A Bayesian Analysis of Zero-inflated Count Data: An Application to Youth Fitness Survey
    Lu, Liying
    Fu, Yingzi
    Chu, Peixiao
    Zhang, Xiaolin
    2014 TENTH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND SECURITY (CIS), 2014, : 699 - 703
  • [10] What belongs where? Variable selection for zero-inflated count models with an application to the demand for health care
    Jochmann, Markus
    COMPUTATIONAL STATISTICS, 2013, 28 (05) : 1947 - 1964