A robust approach for identifying differentially abundant features in metagenomic samples

被引:48
作者
Sohn, Michael B. [1 ]
Du, Ruofei [2 ]
An, Lingling [1 ,2 ]
机构
[1] Univ Arizona, Interdisciplinary Program Stat, Tucson, AZ 85721 USA
[2] Univ Arizona, Dept Agr & Biosyst Engn, Tucson, AZ 85721 USA
基金
美国国家科学基金会; 美国国家卫生研究院; 美国农业部;
关键词
EXPRESSION ANALYSIS; BOTULINUM TOXIN; GUT MICROBIOTA;
D O I
10.1093/bioinformatics/btv165
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The analysis of differential abundance for features (e.g. species or genes) can provide us with a better understanding of microbial communities, thus increasing our comprehension and understanding of the behaviors of microbial communities. However, it could also mislead us about the characteristics of microbial communities if the abundances or counts of features on different scales are not properly normalized within and between communities, prior to the analysis of differential abundance. Normalization methods used in the differential analysis typically try to adjust counts on different scales to a common scale using the total sum, mean or median of representative features across all samples. These methods often yield undesirable results when the difference in total counts of differentially abundant features (DAFs) across different conditions is large. Results: We develop a novel method, Ratio Approach for Identifying Differential Abundance (RAIDA), which utilizes the ratio between features in a modified zero-inflated lognormal model. RAIDA removes possible problems associated with counts on different scales within and between conditions. As a result, its performance is not affected by the amount of difference in total abundances of DAFs across different conditions. Through comprehensive simulation studies, the performance of our method is consistently powerful, and under some situations, RAIDA greatly surpasses other existing methods. We also apply RAIDA on real datasets of type II diabetes and find interesting results consistent with previous reports.
引用
收藏
页码:2269 / 2275
页数:7
相关论文
共 24 条
[1]  
Aherne FJ, 1998, KYBERNETIKA, V34, P363
[2]  
Aitchison J., 1986, The Statistical Analysis of Compositional Data, P416, DOI DOI 10.1007/978-94-009-4109-0
[3]   Differential expression analysis for sequence count data [J].
Anders, Simon ;
Huber, Wolfgang .
GENOME BIOLOGY, 2010, 11 (10)
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Hierarchical Clustering With Prototypes via Minimax Linkage [J].
Bien, Jacob ;
Tibshirani, Robert .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (495) :1075-1084
[6]   A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION [J].
BYRD, RH ;
LU, PH ;
NOCEDAL, J ;
ZHU, CY .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) :1190-1208
[7]   IMAGE SEGMENTATION BY CLUSTERING [J].
COLEMAN, GB ;
ANDREWS, HC .
PROCEEDINGS OF THE IEEE, 1979, 67 (05) :773-785
[8]   A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis [J].
Dillies, Marie-Agnes ;
Rau, Andrea ;
Aubert, Julie ;
Hennequet-Antier, Christelle ;
Jeanmougin, Marine ;
Servant, Nicolas ;
Keime, Celine ;
Marot, Guillemette ;
Castel, David ;
Estelle, Jordi ;
Guernec, Gregory ;
Jagla, Bernd ;
Jouneau, Luc ;
Laloe, Denis ;
Le Gall, Caroline ;
Schaeffer, Brigitte ;
Le Crom, Stephane ;
Guedj, Mickael ;
Jaffrezic, Florence .
BRIEFINGS IN BIOINFORMATICS, 2013, 14 (06) :671-683
[9]   Counting the uncountable: Statistical approaches to estimating microbial diversity [J].
Hughes, JB ;
Hellmann, JJ ;
Ricketts, TH ;
Bohannan, BJM .
APPLIED AND ENVIRONMENTAL MICROBIOLOGY, 2001, 67 (10) :4399-4406
[10]   DIVERGENCE AND BHATTACHARYYA DISTANCE MEASURES IN SIGNAL SELECTION [J].
KAILATH, T .
IEEE TRANSACTIONS ON COMMUNICATION TECHNOLOGY, 1967, CO15 (01) :52-&