Methodology for biomarker discovery with reproducibility in microbiome data using machine learning

被引:5
作者
Rojas-Velazquez, David [1 ,2 ]
Kidwai, Sarah [1 ]
Kraneveld, Aletta D. [1 ,3 ]
Tonda, Alberto [4 ]
Oberski, Daniel [2 ]
Garssen, Johan [1 ,5 ]
Lopez-Rincon, Alejandro [1 ,2 ]
机构
[1] Univ Utrecht, Utrecht Inst Pharmaceut Sci, Fac Sci, Div Pharmacol, Utrecht, Netherlands
[2] Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Dept Data Sci, Utrecht, Netherlands
[3] Vrije Univ Amsterdam, Fac Sci, Dept Neurosci, Amsterdam, Netherlands
[4] Univ Paris Saclay, Inst Syst Complexes Paris, INRAE, UMR 518,MIA PS,Ile France ISC PIF,UAR 3611 CNRS, Paris, France
[5] Danone Nutr Res, Global Ctr Excellence Immunol, Utrecht, Netherlands
关键词
Machine learning; Reproducibility; Microbiome; GUT MICROBIOME; METFORMIN;
D O I
10.1186/s12859-024-05639-3
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
BackgroundIn recent years, human microbiome studies have received increasing attention as this field is considered a potential source for clinical applications. With the advancements in omics technologies and AI, research focused on the discovery for potential biomarkers in the human microbiome using machine learning tools has produced positive outcomes. Despite the promising results, several issues can still be found in these studies such as datasets with small number of samples, inconsistent results, lack of uniform processing and methodologies, and other additional factors lead to lack of reproducibility in biomedical research. In this work, we propose a methodology that combines the DADA2 pipeline for 16s rRNA sequences processing and the Recursive Ensemble Feature Selection (REFS) in multiple datasets to increase reproducibility and obtain robust and reliable results in biomedical research.ResultsThree experiments were performed analyzing microbiome data from patients/cases in Inflammatory Bowel Disease (IBD), Autism Spectrum Disorder (ASD), and Type 2 Diabetes (T2D). In each experiment, we found a biomarker signature in one dataset and applied to 2 other as further validation. The effectiveness of the proposed methodology was compared with other feature selection methods such as K-Best with F-score and random selection as a base line. The Area Under the Curve (AUC) was employed as a measure of diagnostic accuracy and used as a metric for comparing the results of the proposed methodology with other feature selection methods. Additionally, we use the Matthews Correlation Coefficient (MCC) as a metric to evaluate the performance of the methodology as well as for comparison with other feature selection methods.ConclusionsWe developed a methodology for reproducible biomarker discovery for 16s rRNA microbiome sequence analysis, addressing the issues related with data dimensionality, inconsistent results and validation across independent datasets. The findings from the three experiments, across 9 different datasets, show that the proposed methodology achieved higher accuracy compared to other feature selection methods. This methodology is a first approach to increase reproducibility, to provide robust and reliable results.
引用
收藏
页数:17
相关论文
共 72 条
[1]   Analysis of gut microbiota of obese individuals with type 2 diabetes and healthy individuals [J].
Ahmad, Aftab ;
Yang, Wanwei ;
Chen, Guofang ;
Shafiq, Muhammad ;
Javed, Sundus ;
Zaidi, Syed Shujaat Ali ;
Shahid, Ramla ;
Liu, Chao ;
Bokhari, Habib .
PLOS ONE, 2019, 14 (12)
[2]   Machine Learning Strategy for Gut Microbiome-Based Diagnostic Screening of Cardiovascular Disease [J].
Aryal, Sachin ;
Alimadadi, Ahmad ;
Manandhar, Ishan ;
Joe, Bina ;
Cheng, Xi .
HYPERTENSION, 2020, 76 (05) :1555-1562
[3]   The Gut Microbiome as a Target for the Treatment of Type 2 Diabetes [J].
Aydin, Omrum ;
Nieuwdorp, Max ;
Gerdes, Victor .
CURRENT DIABETES REPORTS, 2018, 18 (08)
[4]   Analyzing Type 2 Diabetes Associations with the Gut Microbiome in Individuals from Two Ethnic Backgrounds Living in the Same Geographic Area [J].
Balvers, Manon ;
Deschasaux, Melanie ;
van den Born, Bert-Jan ;
Zwinderman, Koos ;
Nieuwdorp, Max ;
Levin, Evgeni .
NUTRIENTS, 2021, 13 (09)
[5]   Antibiotic Intervention Affects Maternal Immunity During Gestation in Mice [J].
Benner, Marilen ;
Lopez-Rincon, Alejandro ;
Thijssen, Suzan ;
Garssen, Johan ;
Ferwerda, Gerben ;
Joosten, Irma ;
van der Molen, Renate G. ;
Hogenkamp, Astrid .
FRONTIERS IN IMMUNOLOGY, 2021, 12
[6]   Classifying asthma control using salivary and fecal bacterial microbiome in children with moderate-to-severe asthma [J].
Blankestijn, Jelle M. ;
Lopez-Rincon, Alejandro ;
Neerincx, Anne H. ;
Vijverberg, Susanne J. H. ;
Hashimoto, Simone ;
Gorenjak, Mario ;
Prado, Olaia Sardon ;
Corcuera-Elosegui, Paula ;
Korta-Murua, Javier ;
Pino-Yanes, Maria ;
Potocnik, Uros ;
Bang, Corinna ;
Franke, Andre ;
Wolff, Christine ;
Brandstetter, Susanne ;
Toncheva, Antoaneta A. ;
Kheiroddin, Parastoo ;
Harner, Susanne ;
Kabesch, Michael ;
Kraneveld, Aletta D. ;
Abdel-Aziz, Mahmoud, I ;
Maitland-van der Zee, Anke H. .
PEDIATRIC ALLERGY AND IMMUNOLOGY, 2023, 34 (02)
[7]   Trimmomatic: a flexible trimmer for Illumina sequence data [J].
Bolger, Anthony M. ;
Lohse, Marc ;
Usadel, Bjoern .
BIOINFORMATICS, 2014, 30 (15) :2114-2120
[8]   Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2 [J].
Bolyen, Evan ;
Rideout, Jai Ram ;
Dillon, Matthew R. ;
Bokulich, NicholasA. ;
Abnet, Christian C. ;
Al-Ghalith, Gabriel A. ;
Alexander, Harriet ;
Alm, Eric J. ;
Arumugam, Manimozhiyan ;
Asnicar, Francesco ;
Bai, Yang ;
Bisanz, Jordan E. ;
Bittinger, Kyle ;
Brejnrod, Asker ;
Brislawn, Colin J. ;
Brown, C. Titus ;
Callahan, Benjamin J. ;
Caraballo-Rodriguez, Andres Mauricio ;
Chase, John ;
Cope, Emily K. ;
Da Silva, Ricardo ;
Diener, Christian ;
Dorrestein, Pieter C. ;
Douglas, Gavin M. ;
Durall, Daniel M. ;
Duvallet, Claire ;
Edwardson, Christian F. ;
Ernst, Madeleine ;
Estaki, Mehrbod ;
Fouquier, Jennifer ;
Gauglitz, Julia M. ;
Gibbons, Sean M. ;
Gibson, Deanna L. ;
Gonzalez, Antonio ;
Gorlick, Kestrel ;
Guo, Jiarong ;
Hillmann, Benjamin ;
Holmes, Susan ;
Holste, Hannes ;
Huttenhower, Curtis ;
Huttley, Gavin A. ;
Janssen, Stefan ;
Jarmusch, Alan K. ;
Jiang, Lingjing ;
Kaehler, Benjamin D. ;
Bin Kang, Kyo ;
Keefe, Christopher R. ;
Keim, Paul ;
Kelley, Scott T. ;
Knights, Dan .
NATURE BIOTECHNOLOGY, 2019, 37 (08) :852-857
[9]   Exact sequence variants should replace operational taxonomic units in marker-gene data analysis [J].
Callahan, Benjamin J. ;
McMurdie, Paul J. ;
Holmes, Susan P. .
ISME JOURNAL, 2017, 11 (12) :2639-2643
[10]  
Callahan BJ, 2016, NAT METHODS, V13, P581, DOI [10.1038/nmeth.3869, 10.1038/NMETH.3869]