Successful strategies for human microbiome data generation, storage and analyses

被引:2
作者
Holmes, Susan [1 ]
机构
[1] Stat Dept, Sequoia Hall, Stanford, CA 94305 USA
关键词
Bayesian; bootstrap; experimental design; latent variable; longitudinal; statistical analyses; microbiome; visualization;
D O I
10.1007/s12038-019-9934-y
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Current interest in the potential for clinical use of new tools for improving human health are now focused on techniques for the study of the human microbiome and its interaction with environmental and clinical covariates. This review outlines the use of statistical strategies that have been developed in past studies and can inform successful design and analyses of controlled perturbation experiments performed in the human microbiome. We carefully outline what the data are, their imperfections and how we need to transform, decontaminate and denoise them. We show how to identify the important unknown parameters and how to can leverage variability we see to produce efficient models for prediction and uncertainty quantification. We encourage a reproducible strategy that builds on best practice principles that can be adapted for effective experimental design and reproducible workflows. Nonparametric, data-driven denoising strategies already provide the best strain identification and decontamination methods. Data driven models can be combined with uncertainty quantification to provide reproducible aids to decision making in the clinical context, as long as careful, separate, registered confirmatory testing are undertaken. Here we provide guidelines for effective longitudinal studies and their analyses. Lessons learned along the way are that visualizations at every step can pinpoint problems and outliers, normalization and filtering improve power in downstream testing. We recommend collecting and binding the metadata and covariates to sample descriptors and recording complete computer scripts into an R markdown supplement that can reduce opportunities for human error and enable collaborators and readers to replicate all the steps of the study. Finally, we note that optimizing the bioinformatic and statistical workflow involves adopting a wait-and-see approach that is particularly effective in cases where the features such as 'mass spectrometry peaks' and metagenomic tables can only be partially annotated.
引用
收藏
页数:6
相关论文
共 22 条
[1]  
[Anonymous], 2016, F1000RESEARCH
[2]  
Callahan B, 2016, BIOCOMPUT-PAC SYM, P183
[3]   Exact sequence variants should replace operational taxonomic units in marker-gene data analysis [J].
Callahan, Benjamin J. ;
McMurdie, Paul J. ;
Holmes, Susan P. .
ISME JOURNAL, 2017, 11 (12) :2639-2643
[4]  
Callahan BJ, 2016, NAT METHODS, V13, P581, DOI [10.1038/NMETH.3869, 10.1038/nmeth.3869]
[5]   Simple statistical identification and removal of contaminant sequences in marker-gene and metagenomics data [J].
Davis, Nicole M. ;
Proctor, Diana M. ;
Holmes, Susan P. ;
Relman, David A. ;
Callahan, Benjamin J. .
MICROBIOME, 2018, 6
[6]   Temporal and spatial variation of the human microbiota during pregnancy [J].
DiGiulio, Daniel B. ;
Callahan, Benjamin J. ;
McMurdie, Paul J. ;
Costello, Elizabeth K. ;
Lyell, Deirdre J. ;
Robaczewska, Anna ;
Sun, Christine L. ;
Goltsman, Daniela S. A. ;
Wong, Ronald J. ;
Shaw, Gary ;
Stevenson, David K. ;
Holmes, Susan P. ;
Relman, David A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (35) :11060-11065
[7]  
Fukuyama J., 2017, ARXIV170200501
[8]   Multidomain analyses of a longitudinal human microbiome intestinal cleanout perturbation experiment [J].
Fukuyama, Julia ;
Rumker, Laurie ;
Sankaran, Kris ;
Jeganathan, Pratheepa ;
Dethlefsen, Les ;
Relman, David A. ;
Holmes, Susan P. .
PLOS COMPUTATIONAL BIOLOGY, 2017, 13 (08) :e1005706
[9]   Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics [J].
Holmes, Ian ;
Harris, Keith ;
Quince, Christopher .
PLOS ONE, 2012, 7 (02)
[10]  
Holmes S., 2019, MODERN STAT MODERN B