Machine Learning Meta-analysis of Large Metagenomic Datasets: Tools and Biological Insights

被引:371
|
作者
Pasolli, Edoardo [1 ]
Duy Tin Truong [1 ]
Malik, Faizan [2 ]
Waldron, Levi [2 ]
Segata, Nicola [1 ]
机构
[1] Univ Trento, Ctr Integrat Biol, Trento, Italy
[2] CUNY, Grad Sch Publ Hlth & Hlth Policy, New York, NY 10021 USA
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
MULTICATEGORY CLASSIFICATION METHODS; HUMAN GUT MICROBIOME; COMPREHENSIVE EVALUATION; FECAL MICROBIOTA; GENE-EXPRESSION; VALIDATION; PREDICTION; REGRESSION; SELECTION;
D O I
10.1371/journal.pcbi.1004977
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Shotgun metagenomic analysis of the human associated microbiome provides a rich set of microbial features for prediction and biomarker discovery in the context of human diseases and health conditions. However, the use of such high-resolution microbial features presents new challenges, and validated computational tools for learning tasks are lacking. Moreover, classification rules have scarcely been validated in independent studies, posing questions about the generality and generalization of disease-predictive models across cohorts. In this paper, we comprehensively assess approaches to metagenomics-based prediction tasks and for quantitative assessment of the strength of potential microbiome-phenotype associations. We develop a computational framework for prediction tasks using quantitative microbiome profiles, including species-level relative abundances and presence of strain-specific markers. A comprehensive meta-analysis, with particular emphasis on generalization across cohorts, was performed in a collection of 2424 publicly available metagenomic samples from eight large-scale studies. Cross-validation revealed good disease-prediction capabilities, which were in general improved by feature selection and use of strain-specific markers instead of species-level taxonomic abundance. In cross-study analysis, models transferred between studies were in some cases less accurate than models tested by within-study cross-validation. Interestingly, the addition of healthy (control) samples from other studies to training sets improved disease prediction capabilities. Some microbial species (most notably Streptococcus anginosus) seem to characterize general dysbiotic states of the microbiome rather than connections with a specific disease. Our results in modelling features of the "healthy" microbiome can be considered a first step toward defining general microbial dysbiosis. The software framework, microbiome profiles, and metadata for thousands of samples are publicly available at http://segatalab.cibio.unitn.it/tools/metaml.
引用
收藏
页数:26
相关论文
共 50 条
  • [41] Diagnostic performance of magnetic resonance imaging-based machine learning in Alzheimer's disease detection: a meta-analysis
    Hu, Jiayi
    Wang, Yashan
    Guo, Dingjie
    Qu, Zihan
    Sui, Chuanying
    He, Guangliang
    Wang, Song
    Chen, Xiaofei
    Wang, Chunpeng
    Liu, Xin
    NEURORADIOLOGY, 2023, 65 (03) : 513 - 527
  • [42] A Transcriptomics-Based Meta-Analysis Combined With Machine Learning Identifies a Secretory Biomarker Panel for Diagnosis of Pancreatic Adenocarcinoma
    Khatri, Indu
    Bhasin, Manoj K.
    FRONTIERS IN GENETICS, 2020, 11
  • [43] Exploring the effectiveness of artificial intelligence, machine learning and deep learning in trauma triage: A systematic review and meta-analysis
    Adebayo, Oluwasemilore
    Bhuiyan, Zunira Areeba
    Ahmed, Zubair
    DIGITAL HEALTH, 2023, 9
  • [44] Unified Transcriptomic Signature of Arbuscular Mycorrhiza Colonization in Roots of Medicago truncatula by Integration of Machine Learning, Promoter Analysis, and Direct Merging Meta-Analysis
    Mohammadi-Dehcheshmeh, Manijeh
    Niazi, Ali
    Ebrahimi, Mansour
    Tahsili, Mohammadreza
    Nurollah, Zahra
    Khaksefid, Reyhaneh Ebrahimi
    Ebrahimi, Mahdi
    Ebrahimie, Esmaeil
    FRONTIERS IN PLANT SCIENCE, 2018, 9
  • [45] Meningioma MRI radiomics and machine learning: systematic review, quality score assessment, and meta-analysis
    Ugga, Lorenzo
    Perillo, Teresa
    Cuocolo, Renato
    Stanzione, Arnaldo
    Romeo, Valeria
    Green, Roberta
    Cantoni, Valeria
    Brunetti, Arturo
    NEURORADIOLOGY, 2021, 63 (08) : 1293 - 1304
  • [46] Satellite-based Machine Learning modelling of Ecosystem Services indicators: A review and meta-analysis
    Almeida, Bruna
    David, Joao
    Campos, Felipe S.
    Cabral, Pedro
    APPLIED GEOGRAPHY, 2024, 165
  • [47] Performance of advanced machine learning algorithms overlogistic regression in predicting hospital readmissions: A meta-analysis
    Talwar, Ashna
    Lopez-Olivo, Maria A.
    Huang, Yinan
    Ying, Lin
    Aparasu, Rajender R.
    EXPLORATORY RESEARCH IN CLINICAL AND SOCIAL PHARMACY, 2023, 11
  • [48] Machine learning for predicting hematoma expansion in spontaneous intracerebral hemorrhage: a systematic review and meta-analysis
    Liu, Yihua
    Zhao, Fengfeng
    Niu, Enjing
    Chen, Liang
    NEURORADIOLOGY, 2024, 66 (09) : 1603 - 1616
  • [49] Bone age assessment with various machine learning techniques: A systematic literature review and meta-analysis
    Dallora, Ana Luiza
    Anderberg, Peter
    Kvist, Ola
    Mendes, Emilia
    Ruiz, Sandra Diaz
    Berglund, Johan Sanmartin
    PLOS ONE, 2019, 14 (07):
  • [50] Predictive utility of the machine learning algorithms in predicting tendinopathy: a meta-analysis of diagnostic test studies
    Muir, Duncan
    Elgebaly, Ahmed
    Kim, Woo Jae
    Althaher, Ahmad
    Narvani, Ali
    Imam, Mohamed A.
    EUROPEAN JOURNAL OF ORTHOPAEDIC SURGERY AND TRAUMATOLOGY, 2025, 35 (01)