Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

被引:371
|
作者
Pasolli, Edoardo [1 ]
Duy Tin Truong [1 ]
Malik, Faizan [2 ]
Waldron, Levi [2 ]
Segata, Nicola [1 ]
机构
[1] Univ Trento, Ctr Integrat Biol, Trento, Italy
[2] CUNY, Grad Sch Publ Hlth & Hlth Policy, New York, NY 10021 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
MULTICATEGORY CLASSIFICATION METHODS; HUMAN GUT MICROBIOME; COMPREHENSIVE EVALUATION; FECAL MICROBIOTA; GENE-EXPRESSION; VALIDATION; PREDICTION; REGRESSION; SELECTION;
D O I
10.1371/journal.pcbi.1004977
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
引用
收藏
页数:26
相关论文
共 50 条
  • [31] Machine learning prediction models for diabetic kidney disease: systematic review and meta-analysis
    Chen, Lianqin
    Shao, Xian
    Yu, Pei
    ENDOCRINE, 2024, 84 (03) : 890 - 902
  • [32] Comparative transcriptomics analysis pipeline for the meta-analysis of phylogenetically divergent datasets (CoRMAP)
    Sheng, Yiru
    Ali, R. Ayesha
    Heyland, Andreas
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [33] Endothelial cell senescence: A machine learning-based meta-analysis of transcriptomic studies
    Park, Hyun Suk
    Kim, Sung Young
    AGEING RESEARCH REVIEWS, 2021, 65
  • [34] Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy
    Fleuren, Lucas M.
    Klausch, Thomas L. T.
    Zwager, Charlotte L.
    Schoonmade, Linda J.
    Guo, Tingjie
    Roggeveen, Luca F.
    Swart, Eleonora L.
    Girbes, Armand R. J.
    Thoral, Patrick
    Ercole, Ari
    Hoogendoorn, Mark
    Elbers, Paul W. G.
    INTENSIVE CARE MEDICINE, 2020, 46 (03) : 383 - 400
  • [35] rs-fMRI and machine learning for ASD diagnosis: a systematic review and meta-analysis
    Santana, Caio Pinheiro
    de Carvalho, Emerson Assis
    Rodrigues, Igor Duarte
    Bastos, Guilherme Sousa
    de Souza, Adler Diniz
    de Brito, Lucelmo Lacerda
    SCIENTIFIC REPORTS, 2022, 12 (01)
  • [36] Machine learning in the prediction of immunotherapy response and prognosis of melanoma: a systematic review and meta-analysis
    Li, Juan
    Dan, Kena
    Ai, Jun
    FRONTIERS IN IMMUNOLOGY, 2024, 15
  • [37] Machine learning for the prediction of sepsis: a systematic review and meta-analysis of diagnostic test accuracy
    Lucas M. Fleuren
    Thomas L. T. Klausch
    Charlotte L. Zwager
    Linda J. Schoonmade
    Tingjie Guo
    Luca F. Roggeveen
    Eleonora L. Swart
    Armand R. J. Girbes
    Patrick Thoral
    Ari Ercole
    Mark Hoogendoorn
    Paul W. G. Elbers
    Intensive Care Medicine, 2020, 46 : 383 - 400
  • [38] Comparative transcriptomics analysis pipeline for the meta-analysis of phylogenetically divergent datasets (CoRMAP)
    Yiru Sheng
    R. Ayesha Ali
    Andreas Heyland
    BMC Bioinformatics, 23
  • [39] Prognostic Tools for Early Mortality in Hemorrhagic Stroke: Systematic Review and Meta-Analysis
    Mattishent, Katharine
    Kwok, Chun Shing
    Ashkir, Liban
    Pelpola, Kelum
    Myint, Phyo Kyaw
    Loke, Yoon Kong
    JOURNAL OF CLINICAL NEUROLOGY, 2015, 11 (04): : 339 - 348
  • [40] Machine Learning Algorithms for Rupture Risk Assessment of Intracranial Aneurysms: A Diagnostic Meta-Analysis
    Shu, Zhang
    Chen, Song
    Wang, Wei
    Qiu, Yufa
    Yu, Ying
    Lyu, Nan
    Wang, Chi
    WORLD NEUROSURGERY, 2022, 165 : E137 - E147