Investigating differential abundance methods in microbiome data: A benchmark study

被引:36
作者
Cappellato, Marco [1 ]
Baruzzo, Giacomo [1 ]
Di Camillo, Barbara [1 ,2 ]
机构
[1] Univ Padua, Dept Informat Engn, Padua, Italy
[2] Univ Padua, Dept Comparat Biomed & Food Sci, Padua, Italy
关键词
STATISTICAL-ANALYSIS; EXPRESSION; DIVERSITY;
D O I
10.1371/journal.pcbi.1010467
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
The development of increasingly efficient and cost-effective high throughput DNA sequencing techniques has enhanced the possibility of studying complex microbial systems. Recently, researchers have shown great interest in studying the microorganisms that characterise different ecological niches. Differential abundance analysis aims to find the differences in the abundance of each taxa between two classes of subjects or samples, assigning a significance value to each comparison. Several bioinformatic methods have been specifically developed, taking into account the challenges of microbiome data, such as sparsity, the different sequencing depth constraint between samples and compositionality. Differential abundance analysis has led to important conclusions in different fields, from health to the environment. However, the lack of a known biological truth makes it difficult to validate the results obtained. In this work we exploit metaSPARSim, a microbial sequencing count data simulator, to simulate data with differential abundance features between experimental groups. We perform a complete comparison of recently developed and established methods on a common benchmark with great effort to the reliability of both the simulated scenarios and the evaluation metrics. The performance overview includes the investigation of numerous scenarios, studying the effect on methods' results on the main covariates such as sample size, percentage of differentially abundant features, sequencing depth, feature variability, normalisation approach and ecological niches. Mainly, we find that methods show a good control of the type I error and, generally, also of the false discovery rate at high sample size, while recall seem to depend on the dataset and sample size.
引用
收藏
页数:33
相关论文
共 56 条
[1]  
AITCHISON J, 1982, J ROY STAT SOC B, V44, P139
[2]   Empirical assessment of the impact of sample number and read depth on RNA-Seq analysis workflow performance [J].
Baccarella, Alyssa ;
Williams, Claire R. ;
Parrish, Jay Z. ;
Kim, Charles C. .
BMC BIOINFORMATICS, 2018, 19
[3]  
Baruzzo G., 2021, BMC BIOINFORMATICS, V22, P1
[4]   Low IgA Associated With Oropharyngeal Microbiota Changes and Lung Disease in Primary Antibody Deficiency [J].
Berbers, Roos-Marijn ;
Hoesein, Firdaus A. A. Mohamed ;
Ellerbroek, Pauline M. ;
van Montfrans, Joris M. ;
Dalm, Virgil A. S. H. ;
van Hagen, P. Martin ;
Paganelli, Fernanda L. ;
Viveen, Marco C. ;
Rogers, Malbert R. C. ;
de Jong, Pim A. ;
Uh, Hae-Won ;
Willems, Rob J. L. ;
Leavis, Helen L. .
FRONTIERS IN IMMUNOLOGY, 2020, 11
[5]   Microbial resolution of whole genome shotgun and 16S amplicon metagenomic sequencing using publicly available NEON data [J].
Brumfield, Kyle D. ;
Huq, Anwar ;
Colwell, Rita R. ;
Olds, James L. ;
Leddy, Menu B. .
PLOS ONE, 2020, 15 (02)
[6]   Assessment of statistical methods from single cell, bulk RNA-seq, and metagenomics applied to microbiome data [J].
Calgaro, Matteo ;
Romualdi, Chiara ;
Waldron, Levi ;
Risso, Davide ;
Vitulo, Nicola .
GENOME BIOLOGY, 2020, 21 (01)
[7]  
Carini M. T., 1997, International Astronomical Union Circular
[8]   GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data [J].
Chen, Li ;
Reeve, James ;
Zhang, Lujun ;
Huang, Shengbing ;
Wang, Xuefeng ;
Chen, Jun .
PEERJ, 2018, 6
[9]  
Clausen D.S., 2022, MODELING COMPLEX MEA
[10]   Abiraterone acetate preferentially enriches for the gut commensal Akkermansia muciniphila in castrate-resistant prostate cancer patients [J].
Daisley, Brendan A. ;
Chanyi, Ryan M. ;
Abdur-Rashid, Kamilah ;
Al, Kait F. ;
Gibbons, Shaeley ;
Chmiel, John A. ;
Wilcox, Hannah ;
Reid, Gregor ;
Anderson, Amanda ;
Dewar, Malcolm ;
Nair, Shiva M. ;
Chin, Joseph ;
Burton, Jeremy P. .
NATURE COMMUNICATIONS, 2020, 11 (01)