8 Using Expert Driven Machine Learning to Enhance Dynamic Metabolomics Data Analysis

被引:16
作者
Beirnaert, Charlie [1 ]
Peeters, Laura [1 ]
Meysman, Pieter [1 ]
Bittremieux, Wout [1 ,2 ]
Foubert, Kenn [3 ]
Custers, Deborah [3 ]
Van der Auwera, Anastasia [3 ]
Cuykx, Matthias [4 ]
Pieters, Luc [3 ]
Covaci, Adrian [4 ]
Laukens, Kris [1 ]
机构
[1] Univ Antwerp, Dept Math & Comp Sci, Adrem Data Lab, B-2000 Antwerp, Belgium
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[3] Univ Antwerp, Dept Pharmaceut Sci, Nat Prod & Food Res & Anal NatuRA, B-2000 Antwerp, Belgium
[4] Univ Antwerp, Dept Pharmaceut Sci, Toxicol Ctr, B-2000 Antwerp, Belgium
来源
METABOLITES | 2019年 / 9卷 / 03期
关键词
machine learning; dynamic metabolomics; data simulation; OPERATING CHARACTERISTIC CURVES;
D O I
10.3390/metabo9030054
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Data analysis for metabolomics is undergoing rapid progress thanks to the proliferation of novel tools and the standardization of existing workflows. As untargeted metabolomics datasets and experiments continue to increase in size and complexity, standardized workflows are often not sufficiently sophisticated. In addition, the ground truth for untargeted metabolomics experiments is intrinsically unknown and the performance of tools is difficult to evaluate. Here, the problem of dynamic multi-class metabolomics experiments was investigated using a simulated dataset with a known ground truth. This simulated dataset was used to evaluate the performance of tinderesting, a new and intuitive tool based on gathering expert knowledge to be used in machine learning. The results were compared to EDGE, a statistical method for time series data. This paper presents three novel outcomes. The first is a way to simulate dynamic metabolomics data with a known ground truth based on ordinary differential equations. This method is made available through the MetaboLouise R package. Second, the EDGE tool, originally developed for genomics data analysis, is highly performant in analyzing dynamic case vs. control metabolomics data. Third, the tinderesting method is introduced to analyse more complex dynamic metabolomics experiments. This tool consists of a Shiny app for collecting expert knowledge, which in turn is used to train a machine learning model to emulate the decision process of the expert. This approach does not replace traditional data analysis workflows for metabolomics, but can provide additional information, improved performance or easier interpretation of results. The advantage is that the tool is agnostic to the complexity of the experiment, and thus is easier to use in advanced setups. All code for the presented analysis, MetaboLouise and tinderesting are freely available.
引用
收藏
页数:13
相关论文
共 22 条
  • [1] Emergence of scaling in random networks
    Barabási, AL
    Albert, R
    [J]. SCIENCE, 1999, 286 (5439) : 509 - 512
  • [2] Beirnaert C., R PACKAGE VERSION 01
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Development and Validation of an in vitro Experimental GastroIntestinal Dialysis Model with Colon Phase to Study the Availability and Colonic Metabolisation of Polyphenolic Compounds
    Breynaert, Annelies
    Bosscher, Douwina
    Kahnt, Ariane
    Claeys, Magda
    Cos, Paul
    Pieters, Luc
    Hermans, Nina
    [J]. PLANTA MEDICA, 2015, 81 (12-13) : 1075 - 1083
  • [5] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [6] MetaboAnalystR: an R package for flexible and reproducible analysis of metabolomics data
    Chong, Jasmine
    Xia, Jianguo
    [J]. BIOINFORMATICS, 2018, 34 (24) : 4313 - 4314
  • [7] MetaboAnalyst 4.0: towards more transparent and integrative metabolomics analysis
    Chong, Jasmine
    Soufan, Othman
    Li, Carin
    Caraus, Iurie
    Li, Shuzhao
    Bourque, Guillaume
    Wishart, David S.
    Xia, Jianguo
    [J]. NUCLEIC ACIDS RESEARCH, 2018, 46 (W1) : W486 - W494
  • [8] Csardi G., 2005, Int J Comp Syst, V1695, P1
  • [9] COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH
    DELONG, ER
    DELONG, DM
    CLARKEPEARSON, DI
    [J]. BIOMETRICS, 1988, 44 (03) : 837 - 845
  • [10] Workflow4Metabolomics: a collaborative research infrastructure for computational metabolomics
    Giacomoni, Franck
    Le Corguille, Gildas
    Monsoor, Misharl
    Landi, Marion
    Pericard, Pierre
    Petera, Melanie
    Duperier, Christophe
    Tremblay-Franco, Marie
    Martin, Jean-Francois
    Jacob, Daniel
    Goulitquer, Sophie
    Thevenot, Etienne A.
    Caron, Christophe
    [J]. BIOINFORMATICS, 2015, 31 (09) : 1493 - 1495