Interpretable metric learning in comparative metagenomics: The adaptive Haar-like distance

被引:1
作者
Gorman, Evan D. [1 ]
Lladser, Manuel E. [1 ]
机构
[1] Univ Colorado, Dept Appl Math, Boulder, CO 80309 USA
基金
美国国家科学基金会;
关键词
BACTERIA; MICROBIOME; COMMUNITIES; DIVERSITY; DATABASE; SPACE;
D O I
10.1371/journal.pcbi.1011543
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Random forests have emerged as a promising tool in comparative metagenomics because they can predict environmental characteristics based on microbial composition in datasets where beta-diversity metrics fall short of revealing meaningful relationships between samples. Nevertheless, despite this efficacy, they lack biological insight in tandem with their predictions, potentially hindering scientific advancement. To overcome this limitation, we leverage a geometric characterization of random forests to introduce a data-driven phylogenetic beta-diversity metric, the adaptive Haar-like distance. This new metric assigns a weight to each internal node (i.e., split or bifurcation) of a reference phylogeny, indicating the relative importance of that node in discerning environmental samples based on their microbial composition. Alongside this, a weighted nearest-neighbors classifier, constructed using the adaptive metric, can be used as a proxy for the random forest while maintaining accuracy on par with that of the original forest and another state-of-the-art classifier, CoDaCoRe. As shown in datasets from diverse microbial environments, however, the new metric and classifier significantly enhance the biological interpretability and visualization of high-dimensional metagenomic samples. Traditional phylogenetic beta-diversity metrics, particularly weighted and unweighted UniFrac, have had great success in comparing and visualizing high-dimensional metagenomic samples. Nonetheless, these metrics rely upon pre-established biological assumptions that might not capture key microbial players or relationships between some samples. On the contrary, supervised machine learning algorithms, such as random forests, can often capture intricate relationships between microbial samples; however, unveiling these relationships is often challenging due to the intricate inner mechanisms inherent to these algorithms.The adaptive Haar-like distance integrates the merits of beta-diversity metrics and random forests, allowing for precise, intuitive, and visual comparison of metagenomic samples, offering valuable scientific insight into the distinctions and associations among microbial environments.
引用
收藏
页数:32
相关论文
共 87 条
[1]  
AITCHISON J, 1982, J ROY STAT SOC B, V44, P139
[2]   Comparison of fecal microbiota of three captive carnivore species inhabiting Korea [J].
An, Choa ;
Okamoto, Yumiko ;
Xu, Siyu ;
Ko, Kyung Yeon ;
Kimura, Junpei ;
Yamamoto, Naomichi .
JOURNAL OF VETERINARY MEDICAL SCIENCE, 2017, 79 (03) :542-546
[3]  
Anderson MJ, 2014, Wiley StatsRef: Statistics Reference Online, P1, DOI [10.1002/9781118445112.stat07841, DOI 10.1002/9781118445112.STAT07841]
[4]  
[Anonymous], P NATL I SCI CALCUTT, DOI DOI 10.1007/S13171-019-00164-5
[5]   Applications and Comparison of Dimensionality Reduction Methods for Microbiome Data [J].
Armstrong, George ;
Rahman, Gibraan ;
Martino, Cameron ;
McDonald, Daniel ;
Gonzalez, Antonio ;
Mishne, Gal ;
Knight, Rob .
FRONTIERS IN BIOINFORMATICS, 2022, 2
[6]  
Bouckaert RR, 2004, LECT NOTES ARTIF INT, V3056, P3
[7]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[8]  
Breiman L., 1984, Statistics/ Probability Series, DOI [10.2307/2530946, DOI 10.2307/2530946]
[9]   Exact sequence variants should replace operational taxonomic units in marker-gene data analysis [J].
Callahan, Benjamin J. ;
McMurdie, Paul J. ;
Holmes, Susan P. .
ISME JOURNAL, 2017, 11 (12) :2639-2643
[10]   ''Candidatus Colwellia aromaticivorans" sp. nov., "Candidatus Halocyntiibacter alkanivorans" sp. nov., and "Candidatus Ulvibacter alkanivorans" sp. nov. Genome Sequences [J].
Campeao, Mariana E. ;
Swings, Jean ;
Silva, Bruno Sergio ;
Otsuki, Koko ;
Thompson, Fabiano L. ;
Thompson, Cristiane C. .
MICROBIOLOGY RESOURCE ANNOUNCEMENTS, 2019, 8 (15)