Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

被引:371
|
作者
Pasolli, Edoardo [1 ]
Duy Tin Truong [1 ]
Malik, Faizan [2 ]
Waldron, Levi [2 ]
Segata, Nicola [1 ]
机构
[1] Univ Trento, Ctr Integrat Biol, Trento, Italy
[2] CUNY, Grad Sch Publ Hlth & Hlth Policy, New York, NY 10021 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
MULTICATEGORY CLASSIFICATION METHODS; HUMAN GUT MICROBIOME; COMPREHENSIVE EVALUATION; FECAL MICROBIOTA; GENE-EXPRESSION; VALIDATION; PREDICTION; REGRESSION; SELECTION;
D O I
10.1371/journal.pcbi.1004977
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
引用
收藏
页数:26
相关论文
共 50 条
  • [21] Exploring the soluble (pro)renin receptor (sPRR) as a biomarker in pathophysiological disorders: Integrating machine learning and meta-analysis for insights into gestational diabetes
    Aalami, Amir Hossein
    Rahimi, Mohammad
    Sahebkar, Amirhossein
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 275
  • [22] A systematic review and meta-analysis of groundwater level forecasting with machine learning techniques: Current status and future directions
    Uc-Castillo, Jose Luis
    Marin-Celestino, Ana Elizabeth
    Martinez-Cruz, Diego Armando
    Tuxpan-Vargas, Jose
    Ramos-Leal, Jose Alfredo
    ENVIRONMENTAL MODELLING & SOFTWARE, 2023, 168
  • [23] Machine learning with explainability or spatial hedonics tools? An analysis of the asking prices in the housing market in Alicante, Spain
    Rico-Juan, Juan Ramon
    de La Paz, Paloma Taltavull
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 171
  • [24] Review of software tools for design and analysis of large scale MRM proteomic datasets
    Colangelo, Christopher M.
    Chung, Lisa
    Bruce, Can
    Cheung, Kei-Hoi
    METHODS, 2013, 61 (03) : 287 - 298
  • [25] Machine learning for the prediction of sepsis-related death: a systematic review and meta-analysis
    Zhang, Yan
    Xu, Weiwei
    Yang, Ping
    Zhang, An
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [26] Machine learning techniques for code smell detection: A systematic literature review and meta-analysis
    Azeem, Muhammad Ilyas
    Palomba, Fabio
    Shi, Lin
    Wang, Qing
    INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 108 : 115 - 138
  • [27] Diagnostic performance of machine-learning algorithms for sepsis prediction: An updated meta-analysis
    Zhang, Hongru
    Wang, Chen
    Yang, Ning
    TECHNOLOGY AND HEALTH CARE, 2024, 32 (06) : 4291 - 4307
  • [28] Machine-Learning-Based Prediction Modeling for Debris Flow Occurrence: A Meta-Analysis
    Yang, Lianbing
    Ge, Yonggang
    Chen, Baili
    Wu, Yuhong
    Fu, Runde
    WATER, 2024, 16 (07)
  • [29] Machine Learning-Based Prediction Models for Delirium: A Systematic Review and Meta-Analysis
    Xie, Qi
    Wang, Xinglei
    Pei, Juhong
    Wu, Yinping
    Guo, Qiang
    Su, Yujie
    Yan, Hui
    Nan, Ruiling
    Chen, Haixia
    Dou, Xinman
    JOURNAL OF THE AMERICAN MEDICAL DIRECTORS ASSOCIATION, 2022, 23 (10) : 1655 - +
  • [30] Diagnostic Accuracy of Machine Learning Models to Identify Congenital Heart Disease: A Meta-Analysis
    Hoodbhoy, Zahra
    Jiwani, Uswa
    Sattar, Saima
    Salam, Rehana
    Hasan, Babar
    Das, Jai K.
    FRONTIERS IN ARTIFICIAL INTELLIGENCE, 2021, 4