GrammR: graphical representation and modeling of count data with application in metagenomics

被引:5
作者
Ayyala, Deepak Nag [1 ]
Lin, Shili [1 ]
机构
[1] Ohio State Univ, Dept Stat, Columbus, OH 43210 USA
基金
美国国家科学基金会;
关键词
GUT MICROBIOME;
D O I
10.1093/bioinformatics/btv032
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Microbiota compositions have great implications in human health, such as obesity and other conditions. As such, it is of great importance to cluster samples or taxa to visualize and discover community substructures. Graphical representation of metagenomic count data relies on two aspects, measure of dissimilarity between samples/taxa and algorithm used to estimate coordinates to study microbiota communities. UniFrac is a dissimilarity measure commonly used in metagenomic research, but it requires a phylogenetic tree. Principal coordinate analysis (PCoA) is a popular algorithm for estimating two-dimensional (2D) coordinates for graphical representation, although alternative and higher-dimensional representations may reveal underlying community substructures invisible in 2D representations. Results: We adapt a new measure of dissimilarity, penalized Kendall's tau-distance, which does not depend on a phylogenetic tree, and hence more readily applicable to a wider class of problems. Further, we propose to use metric multidimensional scaling (MDS) as an alternative to PCoA for graphical representation. We then devise a novel procedure for determining the number of clusters in conjunction with PAM (mPAM). We show superior performances with higher-dimensional representations. We further demonstrate the utility of mPAM for accurate clustering analysis, especially with higher-dimensional MDS models. Applications to two human microbiota datasets illustrate greater insights into the subcommunity structure with a higher-dimensional analysis.
引用
收藏
页码:1648 / 1654
页数:7
相关论文
共 19 条
  • [1] Computing distances between partial rankings
    Bansal, Mukul S.
    Fernandez-Baca, David
    [J]. INFORMATION PROCESSING LETTERS, 2009, 109 (04) : 238 - 241
  • [2] Bacterial Community Variation in Human Body Habitats Across Space and Time
    Costello, Elizabeth K.
    Lauber, Christian L.
    Hamady, Micah
    Fierer, Noah
    Gordon, Jeffrey I.
    Knight, Rob
    [J]. SCIENCE, 2009, 326 (5960) : 1694 - 1697
  • [3] DIETZ PF, 1989, LECT NOTES COMPUT SC, V382, P39
  • [4] Fagin Ronald., 2004, SIAM J DISCRETE MATH, V20, P47
  • [5] Dirichlet Multinomial Mixtures: Generative Models for Microbial Metagenomics
    Holmes, Ian
    Harris, Keith
    Quince, Christopher
    [J]. PLOS ONE, 2012, 7 (02):
  • [6] Extensions to the k-means algorithm for clustering large data sets with categorical values
    Huang, ZX
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (03) : 283 - 304
  • [7] COMPARING PARTITIONS
    HUBERT, L
    ARABIE, P
    [J]. JOURNAL OF CLASSIFICATION, 1985, 2 (2-3) : 193 - 218
  • [8] Kaufman L., 1987, STAT DATA ANAL BASED, V20, P53
  • [9] A new measure of rank correlation
    Kendall, MG
    [J]. BIOMETRIKA, 1938, 30 : 81 - 93
  • [10] Evolution of mammals and their gut microbes
    Ley, Ruth E.
    Hamady, Micah
    Lozupone, Catherine
    Turnbaugh, Peter J.
    Ramey, Rob Roy
    Bircher, J. Stephen
    Schlegel, Michael L.
    Tucker, Tammy A.
    Schrenzel, Mark D.
    Knight, Rob
    Gordon, Jeffrey I.
    [J]. SCIENCE, 2008, 320 (5883) : 1647 - 1651