Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

被引:371
|
作者
Pasolli, Edoardo [1 ]
Duy Tin Truong [1 ]
Malik, Faizan [2 ]
Waldron, Levi [2 ]
Segata, Nicola [1 ]
机构
[1] Univ Trento, Ctr Integrat Biol, Trento, Italy
[2] CUNY, Grad Sch Publ Hlth & Hlth Policy, New York, NY 10021 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
MULTICATEGORY CLASSIFICATION METHODS; HUMAN GUT MICROBIOME; COMPREHENSIVE EVALUATION; FECAL MICROBIOTA; GENE-EXPRESSION; VALIDATION; PREDICTION; REGRESSION; SELECTION;
D O I
10.1371/journal.pcbi.1004977
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
引用
收藏
页数:26
相关论文
共 50 条
  • [1] Predicting Cytotoxicity of Nanoparticles: A Meta-Analysis Using Machine Learning
    Masarkar, Ashish
    Maparu, Auhin Kumar
    Nukavarapu, Yaswanth Sai
    Rai, Beena
    ACS APPLIED NANO MATERIALS, 2024, 7 (17) : 19991 - 20002
  • [2] Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox
    Wirbel, Jakob
    Zych, Konrad
    Essex, Morgan
    Karcher, Nicolai
    Kartal, Ece
    Salazar, Guillem
    Bork, Peer
    Sunagawa, Shinichi
    Zeller, Georg
    GENOME BIOLOGY, 2021, 22 (01)
  • [3] Groundwater Level Modeling with Machine Learning: A Systematic Review and Meta-Analysis
    Ahmadi, Arman
    Olyaei, Mohammadali
    Heydari, Zahra
    Emami, Mohammad
    Zeynolabedin, Amin
    Ghomlaghi, Arash
    Daccache, Andre
    Fogg, Graham E.
    Sadegh, Mojtaba
    WATER, 2022, 14 (06)
  • [4] Systematic Review and Meta-Analysis of Prehospital Machine Learning Scores as Screening Tools for Early Detection of Large Vessel Occlusion in Patients With Suspected Stroke
    Alobaida, Muath
    Joddrell, Martha
    Zheng, Yalin
    Lip, Gregory Y. H.
    Rowe, Fiona J.
    El-Bouri, Wahbi K.
    Hill, Andrew
    Lane, Deirdre A.
    Harrison, Stephanie L.
    JOURNAL OF THE AMERICAN HEART ASSOCIATION, 2024, 13 (12):
  • [5] Topic modeling for cluster analysis of large biological and medical datasets
    Zhao, Weizhong
    Zou, Wen
    Chen, James J.
    BMC BIOINFORMATICS, 2014, 15
  • [6] A voting-based machine learning approach for classifying biological and clinical datasets
    Daneshvar, Negar Hossein-Nezhad
    Masoudi-Sobhanzadeh, Yosef
    Omidi, Yadollah
    BMC BIOINFORMATICS, 2023, 24 (01)
  • [7] How to do meta-analysis of open datasets
    Culina, Antica
    Crowther, Thomas W.
    Ramakers, Jip J. C.
    Gienapp, Phillip
    Visser, Marcel E.
    NATURE ECOLOGY & EVOLUTION, 2018, 2 (07): : 1053 - 1056
  • [8] A Machine Learning Approach to Reduce Dimensional Space in Large Datasets
    Terol, Rafael Munoz
    Reina, Alejandro Reina
    Ziaei, Saber
    Gil, David
    IEEE ACCESS, 2020, 8 : 148181 - 148192
  • [9] Machine learning for prediction of viral hepatitis: A systematic review and meta-analysis
    Moulaei, Khadijeh
    Sharifi, Hamid
    Bahaadinbeigy, Kambiz
    Haghdoost, Ali Akbar
    Nasiri, Naser
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2023, 179
  • [10] Machine learning in predicting antimicrobial resistance: a systematic review and meta-analysis
    Tang, Rui
    Luo, Rui
    Tang, Shiwei
    Song, Haoxin
    Chen, Xiujuan
    INTERNATIONAL JOURNAL OF ANTIMICROBIAL AGENTS, 2022, 60 (5-6)