Robust Differential Abundance Analysis of Microbiome Sequencing Data

被引:3
作者
Li, Guanxun [1 ]
Yang, Lu [2 ]
Chen, Jun [2 ]
Zhang, Xianyang [1 ]
机构
[1] Texas A&M Univ, Dept Stat, College Stn, TX 77843 USA
[2] Mayo Clin, Dept Quantitat Hlth Sci, Rochester, MN 55905 USA
关键词
compositional data; differential abundance analysis; Huber regression; robustness; winsorization; REGRESSION;
D O I
10.3390/genes14112000
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
It is well known that the microbiome data are ridden with outliers and have heavy distribution tails, but the impact of outliers and heavy-tailedness has yet to be examined systematically. This paper investigates the impact of outliers and heavy-tailedness on differential abundance analysis (DAA) using the linear models for the differential abundance analysis (LinDA) method and proposes effective strategies to mitigate their influence. The presence of outliers and heavy-tailedness can significantly decrease the power of LinDA. We investigate various techniques to address outliers and heavy-tailedness, including generalizing LinDA into a more flexible framework that allows for the use of robust regression and winsorizing the data before applying LinDA. Our extensive numerical experiments and real-data analyses demonstrate that robust Huber regression has overall the best performance in addressing outliers and heavy-tailedness.
引用
收藏
页数:23
相关论文
共 36 条
[1]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[2]  
Callahan BJ, 2016, NAT METHODS, V13, P581, DOI [10.1038/NMETH.3869, 10.1038/nmeth.3869]
[3]   An omnibus test for differential distribution analysis of microbiome sequencing data [J].
Chen, Jun ;
King, Emily ;
Deek, Rebecca ;
Wei, Zhi ;
Yu, Yue ;
Grill, Diane ;
Ballman, Karla .
BIOINFORMATICS, 2018, 34 (04) :643-651
[4]   GMPR: A robust normalization method for zero-inflated count data with application to microbiome sequencing data [J].
Chen, Li ;
Reeve, James ;
Zhang, Lujun ;
Huang, Shengbing ;
Wang, Xuefeng ;
Chen, Jun .
PEERJ, 2018, 6
[5]   APPLICATIONS OF NEXT-GENERATION SEQUENCING The human microbiome: at the interface of health and disease [J].
Cho, Ilseung ;
Blaser, Martin J. .
NATURE REVIEWS GENETICS, 2012, 13 (04) :260-270
[6]  
Dixon WJ, 1974, STAT HEFTE, V15, P157, DOI [DOI 10.1007/BF02922904, 10.1007/BF02922904]
[7]   Estimation of high dimensional mean regression in the absence of symmetry and light tail assumptions [J].
Fan, Jianqing ;
Li, Quefeng ;
Wang, Yuyan .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2017, 79 (01) :247-265
[8]   Gut microbiota in human metabolic health and disease [J].
Fan, Yong ;
Pedersen, Oluf .
NATURE REVIEWS MICROBIOLOGY, 2021, 19 (01) :55-71
[9]   Unifying the analysis of high-throughput sequencing datasets: characterizing RNA-seq, 16S rRNA gene sequencing and selective growth experiments by compositional data analysis [J].
Fernandes, Andrew D. ;
Reid, Jennifer N. S. ;
Macklaim, Jean M. ;
McMurrough, Thomas A. ;
Edgell, David R. ;
Gloor, Gregory B. .
MICROBIOME, 2014, 2
[10]   Qiita: rapid, web-enabled microbiome meta-analysis [J].
Gonzalez, Antonio ;
Navas-Molina, Jose A. ;
Kosciolek, Tomasz ;
McDonald, Daniel ;
Vazquez-Baeza, Yoshiki ;
Ackermann, Gail ;
DeReus, Jeff ;
Janssen, Stefan ;
Swafford, Austin D. ;
Orchanian, Stephanie B. ;
Sanders, Jon G. ;
Shorenstein, Joshua ;
Holste, Hannes ;
Petrus, Semar ;
Robbins-Pianka, Adam ;
Brislawn, Colin J. ;
Wang, Mingxun ;
Rideout, Jai Ram ;
Bolyen, Evan ;
Dillon, Matthew ;
Caporaso, J. Gregory ;
Dorrestein, Pieter C. ;
Knight, Rob .
NATURE METHODS, 2018, 15 (10) :796-+