Machine learning applications in microbial ecology, human microbiome studies, and environmental monitoring

被引:138
作者
Ghannam, Ryan B. [1 ]
Techtmann, Stephen M. [1 ]
机构
[1] Michigan Technol Univ, Dept Biol Sci, Houghton, MI 49931 USA
关键词
Machine learning; Marker genes; 16S rRNA; Metagenomics; Forensics; MULTIVARIATE DATA; INTERPRETABILITY; SEQUENCES; UNIFRAC; MODELS; TOOLS; GUIDE;
D O I
10.1016/j.csbj.2021.01.028
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Advances in nucleic acid sequencing technology have enabled expansion of our ability to profile microbial diversity. These large datasets of taxonomic and functional diversity are key to better understanding microbial ecology. Machine learning has proven to be a useful approach for analyzing microbial commu-nity data and making predictions about outcomes including human and environmental health. Machine learning applied to microbial community profiles has been used to predict disease states in human health, environmental quality and presence of contamination in the environment, and as trace evidence in forensics. Machine learning has appeal as a powerful tool that can provide deep insights into microbial communities and identify patterns in microbial community data. However, often machine learning models can be used as black boxes to predict a specific outcome, with little understanding of how the models arrived at predictions. Complex machine learning algorithms often may value higher accuracy and per-formance at the sacrifice of interpretability. In order to leverage machine learning into more translational research related to the microbiome and strengthen our ability to extract meaningful biological informa-tion, it is important for models to be interpretable. Here we review current trends in machine learning applications in microbial ecology as well as some of the important challenges and opportunities for more broad application of machine learning to understanding microbial communities. (C) 2021 The Authors. Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology.
引用
收藏
页码:1092 / 1107
页数:16
相关论文
共 124 条
[1]  
Aasmets O, 2020, MACHINE LEARNING REV
[2]   Ecosystem-wide metagenomic binning enables prediction of ecological niches from genomes [J].
Alneberg, Johannes ;
Bennke, Christin ;
Beier, Sara ;
Bunse, Carina ;
Quince, Christopher ;
Ininbergs, Karolina ;
Riemann, Lasse ;
Ekman, Martin ;
Juergens, Klaus ;
Labrenz, Matthias ;
Pinhassi, Jarone ;
Andersson, Anders F. .
COMMUNICATIONS BIOLOGY, 2020, 3 (01)
[3]  
[Anonymous], 2002, Adv. Neural Inf. Process. Syst.
[4]   Visualizing the effects of predictor variables in black box supervised learning models [J].
Apley, Daniel W. ;
Zhu, Jingyu .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2020, 82 (04) :1059-1086
[5]  
Bathaee, 2017, Harvard Journal ofLaw Technology, V31, P889
[6]   Microbiome Data Accurately Predicts the Postmortem Interval Using Random Forest Regression Models [J].
Belk, Aeriel ;
Xu, Zhenjiang Zech ;
Carter, David O. ;
Lynne, Aaron ;
Bucheli, Sibyl ;
Knight, Rob ;
Metcalf, Jessica L. .
GENES, 2018, 9 (02)
[7]   Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets [J].
Belkina, Anna C. ;
Ciccolella, Christopher O. ;
Anno, Rina ;
Halpert, Richard ;
Spidlen, Josef ;
Snyder-Cappione, Jennifer E. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[8]   A random forest guided tour [J].
Biau, Gerard ;
Scornet, Erwan .
TEST, 2016, 25 (02) :197-227
[9]  
Bishop C.M., 2006, Pattern Recognition and Machine Learning
[10]  
Bogart E, 2019, Genome biology, V20, P1