Applications of Machine Learning in Human Microbiome Studies: A Review on Feature Selection, Biomarker Identification, Disease Prediction and Treatment

被引:172
作者
Marcos-Zambrano, Laura Judith [1 ]
Karaduzovic-Hadziabdic, Kanita [2 ]
Loncar Turukalo, Tatjana [3 ]
Przymus, Piotr [4 ]
Trajkovik, Vladimir [5 ]
Aasmets, Oliver [6 ,7 ]
Berland, Magali [8 ]
Gruca, Aleksandra [9 ]
Hasic, Jasminka [10 ]
Hron, Karel [11 ]
Klammsteiner, Thomas [12 ]
Kolev, Mikhail [13 ]
Lahti, Leo [14 ]
Lopes, Marta B. [15 ,16 ]
Moreno, Victor [17 ,18 ,19 ,20 ]
Naskinova, Irina [13 ]
Org, Elin [6 ]
Paciencia, Ines [21 ]
Papoutsoglou, Georgios [22 ]
Shigdel, Rajesh [23 ]
Stres, Blaz [24 ]
Vilne, Baiba [25 ]
Yousef, Malik [26 ,27 ]
Zdravevski, Eftim [5 ]
Tsamardinos, Ioannis [22 ]
Carrillo de Santa Pau, Enrique [1 ]
Claesson, Marcus J. [28 ,29 ]
Moreno-Indias, Isabel [30 ,31 ]
Truu, Jaak [32 ]
机构
[1] IMDEA Food Inst, Computat Biol Grp, Precis Nutr & Canc Res Program, Madrid, Spain
[2] Int Univ Sarajevo, Fac Engn & Nat Sci, Sarajevo, Bosnia & Herceg
[3] Univ Novi Sad, Fac Tech Sci, Novi Sad, Serbia
[4] Nicolaus Copernicus Univ, Fac Math & Comp Sci, Torun, Poland
[5] Ss Cyril & Methodius Univ, Fac Comp Sci & Engn, Skopje, North Macedonia
[6] Univ Tartu, Inst Genom, Estonian Genome Ctr, Tartu, Estonia
[7] Univ Tartu, Inst Mol & Cell Biol, Dept Biotechnol, Tartu, Estonia
[8] Univ Paris Saclay, INRAE, MGP, Jouy En Josas, France
[9] Silesian Tech Univ, Dept Comp Networks & Syst, Gliwice, Poland
[10] Univ Sarajevo, Sch Sci & Technol, Sarajevo, Bosnia & Herceg
[11] Palacky Univ, Dept Math Anal & Applicat Math, Olomouc, Czech Republic
[12] Univ Innsbruck, Dept Microbiol, Innsbruck, Austria
[13] South West Univ Neofit Rilski, Blagoevgrad, Bulgaria
[14] Univ Turku, Dept Comp, Turku, Finland
[15] UNL, NOVA Lab Comp Sci & Informat NOVA LINCS, FCT, Caparica, Portugal
[16] UNL, Ctr Matemat & Aplicacoes CMA, FCT, Caparica, Portugal
[17] Catalan Inst Oncol ICO Barcelona, Oncol Data Analyt Program, Barcelona, Spain
[18] Inst Recerca Biomed Bellvitge IDIBELL, Colorectal Canc Grp, Barcelona, Spain
[19] Consortium Biomed Res Epidemiol & Publ Hlth CIBER, Barcelona, Spain
[20] Univ Barcelona, Dept Clin Sci, Fac Med, Barcelona, Spain
[21] Univ Porto, EPIUnit, Inst Saude Publ, Porto, Portugal
[22] Univ Crete, Dept Comp Sci, Iraklion, Greece
[23] Univ Bergen, Dept Clin Sci, Bergen, Norway
[24] Univ Ljubljana, Dept Anim Sci, Grp Microbiol & Microbial Biotechnol, Ljubljana, Slovenia
[25] Riga Stradins Univ, Bioinformat Res Unit, Riga, Latvia
[26] Zefat Acad Coll, Dept Informat Syst, Safed, Israel
[27] Zefat Acad Coll, Galilee Digital Hlth Res Ctr GDH, Safed, Israel
[28] Univ Coll Cork, Sch Microbiol, Cork, Ireland
[29] Univ Coll Cork, APC Microbiome Ireland, Cork, Ireland
[30] Univ Malaga, Inst Invest Biomed Malaga IBIMA, Hosp Clin Univ Virgen Victoria, Unidad Gest Clin Endocrinol & Nutr, Malaga, Spain
[31] Inst Salud Carlos III, Ctr Invest Biomed Red Fisiopatol Obes & Nutr CIBE, Madrid, Spain
[32] Univ Tartu, Inst Mol & Cell Biol, Tartu, Estonia
关键词
microbiome; machine learning; disease prediction; biomarker identification; feature selection; COMMUNITIES; KEGG;
D O I
10.3389/fmicb.2021.634511
中图分类号
Q93 [微生物学];
学科分类号
071005 ; 100705 ;
摘要
The number of microbiome-related studies has notably increased the availability of data on human microbiome composition and function. These studies provide the essential material to deeply explore host-microbiome associations and their relation to the development and progression of various complex diseases. Improved data-analytical tools are needed to exploit all information from these biological datasets, taking into account the peculiarities of microbiome data, i.e., compositional, heterogeneous and sparse nature of these datasets. The possibility of predicting host-phenotypes based on taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard, machine learning (ML) provides new insights into the development of models that can be used to predict outputs, such as classification and prediction in microbiology, infer host phenotypes to predict diseases and use microbial communities to stratify patients by their characterization of state-specific microbial signatures. Here we review the state-of-the-art ML methods and respective software applied in human microbiome studies, performed as part of the COST Action ML4Microbiome activities. This scoping review focuses on the application of ML in microbiome studies related to association and clinical use for diagnostics, prognostics, and therapeutics. Although the data presented here is more related to the bacterial community, many algorithms could be applied in general, regardless of the feature type. This literature and software review covering this broad topic is aligned with the scoping review methodology. The manual identification of data sources has been complemented with: (1) automated publication search through digital libraries of the three major publishers using natural language processing (NLP) Toolkit, and (2) an automated identification of relevant software repositories on GitHub and ranking of the related research papers relying on learning to rank approach.
引用
收藏
页数:25
相关论文
共 192 条
  • [1] Using Decision Tree Aggregation with Random Forest Model to Identify Gut Microbes Associated with Colorectal Cancer
    Ai, Dongmei
    Pan, Hongfei
    Han, Rongbao
    Li, Xiaoxin
    Liu, Gang
    Xia, Li C.
    [J]. GENES, 2019, 10 (02):
  • [2] AITCHISON J, 1982, J ROY STAT SOC B, V44, P139
  • [3] A unified catalog of 204,938 reference genomes from the human gut microbiome
    Almeida, Alexandre
    Nayfach, Stephen
    Boland, Miguel
    Strozzi, Francesco
    Beracochea, Martin
    Shi, Zhou Jason
    Pollard, Katherine S.
    Sakharova, Ekaterina
    Parks, Donovan H.
    Hugenholtz, Philip
    Segata, Nicola
    Kyrpides, Nikos C.
    Finn, Robert D.
    [J]. NATURE BIOTECHNOLOGY, 2021, 39 (01) : 105 - 114
  • [4] DeepARG: a deep learning approach for predicting antibiotic resistance genes from metagenomic data
    Arango-Argoty, Gustavo
    Garner, Emily
    Prudent, Amy
    Heath, Lenwood S.
    Vikesland, Peter
    Zhang, Liqing
    [J]. MICROBIOME, 2018, 6
  • [5] Arksey H., 2005, INT J SOC RES METHOD, V8, P19, DOI [10.1080/1364557032000119616, DOI 10.1080/1364557032000119616]
  • [6] MicroPheno: predicting environments and host phenotypes from 16S rRNA gene sequencing using a k-mer based representation of shallow sub-samples (vol 34, i32, 2018)
    Asgari, Ehsaneddin
    Garakani, Kiavash
    McHardy, Alice C.
    Mofrad, Mohammad R. K.
    [J]. BIOINFORMATICS, 2019, 35 (06) : 1082 - 1082
  • [7] Gene Ontology: tool for the unification of biology
    Ashburner, M
    Ball, CA
    Blake, JA
    Botstein, D
    Butler, H
    Cherry, JM
    Davis, AP
    Dolinski, K
    Dwight, SS
    Eppig, JT
    Harris, MA
    Hill, DP
    Issel-Tarver, L
    Kasarskis, A
    Lewis, S
    Matese, JC
    Richardson, JE
    Ringwald, M
    Rubin, GM
    Sherlock, G
    [J]. NATURE GENETICS, 2000, 25 (01) : 25 - 29
  • [8] Composition of gut microbiota and its association with body mass index and lifestyle factors in a cohort of 7-18 years old children from the American Gut Project
    Bai, J.
    Hu, Y.
    Bruner, D. W.
    [J]. PEDIATRIC OBESITY, 2019, 14 (04):
  • [9] The Microbiome Modeling Toolbox: from microbial interactions to personalized microbial communities
    Baldini, Federico
    Heinken, Almut
    Heirendt, Laurent
    Magnusdottir, Stefania
    Fleming, Ronan M. T.
    Thiele, Ines
    [J]. BIOINFORMATICS, 2019, 35 (13) : 2332 - 2334
  • [10] Banfield J.F., 2017, BIORXIV PREPRINT, DOI [10.1101/185348, DOI 10.1101/185348]