UmetaFlow: an untargeted metabolomics workflow for high-throughput data processing and analysis

被引:11
作者
Kontou, Eftychia E. [1 ]
Walter, Axel [2 ,3 ]
Alka, Oliver [2 ,3 ]
Pfeuffer, Julianus [5 ,6 ]
Sachsenberg, Timo [2 ,3 ]
Mohite, Omkar S. [1 ]
Nuhamunada, Matin [1 ]
Kohlbacher, Oliver [2 ,3 ,4 ]
Weber, Tilmann [1 ]
机构
[1] Tech Univ Denmark, Novo Nordisk Fdn Ctr Biosustainabil, Kemitorvet Bldg 220, DK-2800 Lyngby, Denmark
[2] Eberhard Karls Univ Tubingen, Dept Comp Sci, Appl Bioinformat, Sand 14, D-72076 Tubingen, Germany
[3] Univ Tubingen, Inst Bioinformat & Med Informat, Sand 14, D-72076 Tubingen, Germany
[4] Univ Hosp Tubingen, Translat Bioinformat, Schaffhausenstr 77, D-72072 Tubingen, Germany
[5] Zuse Inst Berlin, Visual & Data Centr Comp, Takustr 7, D-14195 Berlin, Germany
[6] Free Univ Berlin, Algorithm Bioinformat, Takustr 9, D-14195 Berlin, Germany
关键词
Untargeted metabolomics; Processing; Analysis; High-throughput workflow; Software; LIQUID-CHROMATOGRAPHY; MASS; DISCOVERY; SOFTWARE; SPECTRA; OPENMS;
D O I
10.1186/s13321-023-00724-w
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Metabolomics experiments generate highly complex datasets, which are time and work-intensive, sometimes even error-prone if inspected manually. Therefore, new methods for automated, fast, reproducible, and accurate data processing and dereplication are required. Here, we present UmetaFlow, a computational workflow for untargeted metabolomics that combines algorithms for data pre-processing, spectral matching, molecular formula and structural predictions, and an integration to the GNPS workflows Feature-Based Molecular Networking and Ion Identity Molecular Networking for downstream analysis. UmetaFlow is implemented as a Snakemake workflow, making it easy to use, scalable, and reproducible. For more interactive computing, visualization, as well as development, the workflow is also implemented in Jupyter notebooks using the Python programming language and a set of Python bindings to the OpenMS algorithms (pyOpenMS). Finally, UmetaFlow is also offered as a web-based Graphical User Interface for parameter optimization and processing of smaller-sized datasets. UmetaFlow was validated with in-house LC-MS/MS datasets of actinomycetes producing known secondary metabolites, as well as commercial standards, and it detected all expected features and accurately annotated 76% of the molecular formulas and 65% of the structures. As a more generic validation, the publicly available MTBLS733 and MTBLS736 datasets were used for benchmarking, and UmetaFlow detected more than 90% of all ground truth features and performed exceptionally well in quantification and discriminating marker selection. We anticipate that UmetaFlow will provide a useful platform for the interpretation of large metabolomics datasets.
引用
收藏
页数:12
相关论文
共 40 条
[1]   Optimal Decharging and Clustering of Charge Ladders Generated in ESI-MS [J].
Bielow, Chris ;
Ruzek, Silke ;
Huber, Christian G. ;
Reinert, Knut .
JOURNAL OF PROTEOME RESEARCH, 2010, 9 (05) :2688-2695
[2]   Sex Differences in Colon Cancer Metabolism Reveal A Novel Subphenotype [J].
Cai, Yuping ;
Rattray, Nicholas J. W. ;
Zhang, Qian ;
Mironova, Varvara ;
Santos-Neto, Alvaro ;
Hsu, Kuo-Shun ;
Rattray, Zahra ;
Cross, Justin R. ;
Zhang, Yawei ;
Paty, Philip B. ;
Khan, Sajid A. ;
Johnson, Caroline H. .
SCIENTIFIC REPORTS, 2020, 10 (01)
[3]   Searching molecular structure databases with tandem mass spectra using CSI:FingerID [J].
Duehrkop, Kai ;
Shen, Huibin ;
Meusel, Marvin ;
Rousu, Juho ;
Boecker, Sebastian .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (41) :12580-12585
[4]   SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information [J].
Duhrkop, Kai ;
Fleischauer, Markus ;
Ludwig, Marcus ;
Aksenov, Alexander A. ;
Melnik, Alexey V. ;
Meusel, Marvin ;
Dorrestein, Pieter C. ;
Rousu, Juho ;
Bocker, Sebastian .
NATURE METHODS, 2019, 16 (04) :299-+
[5]   Combining genomics, metabolome analysis, and biochemical modelling to understand metabolic networks [J].
Fiehn, O .
COMPARATIVE AND FUNCTIONAL GENOMICS, 2001, 2 (03) :155-168
[6]   High-throughput discovery metabolomics [J].
Fuhrer, Tobias ;
Zamboni, Nicola .
CURRENT OPINION IN BIOTECHNOLOGY, 2015, 31 :73-78
[7]   Direct Infusion Based Metabolomics Identifies Metabolic Disease in Patients' Dried Blood Spots and Plasma [J].
Haijes, Hanneke A. ;
Willemsen, Marcel ;
van der Ham, Maria ;
Gerrits, Johan ;
Pras-Raves, Mia L. ;
Prinsen, Hubertus C. M. T. ;
van Hasselt, Peter M. ;
de Sain-van der Velden, Monique G. M. ;
Verhoeven-Duif, Nanda M. ;
Jans, Judith J. M. .
METABOLITES, 2019, 9 (01)
[8]   MassBank: a public repository for sharing mass spectral data for life sciences [J].
Horai, Hisayuki ;
Arita, Masanori ;
Kanaya, Shigehiko ;
Nihei, Yoshito ;
Ikeda, Tasuku ;
Suwa, Kazuhiro ;
Ojima, Yuya ;
Tanaka, Kenichi ;
Tanaka, Satoshi ;
Aoshima, Ken ;
Oda, Yoshiya ;
Kakazu, Yuji ;
Kusano, Miyako ;
Tohge, Takayuki ;
Matsuda, Fumio ;
Sawada, Yuji ;
Hirai, Masami Yokota ;
Nakanishi, Hiroki ;
Ikeda, Kazutaka ;
Akimoto, Naoshige ;
Maoka, Takashi ;
Takahashi, Hiroki ;
Ara, Takeshi ;
Sakurai, Nozomu ;
Suzuki, Hideyuki ;
Shibata, Daisuke ;
Neumann, Steffen ;
Iida, Takashi ;
Tanaka, Ken ;
Funatsu, Kimito ;
Matsuura, Fumito ;
Soga, Tomoyoshi ;
Taguchi, Ryo ;
Saito, Kazuki ;
Nishioka, Takaaki .
JOURNAL OF MASS SPECTROMETRY, 2010, 45 (07) :703-714
[9]   ThermoRawFileParser: Modular, Scalable, and Cross-Platform RAW File Conversion [J].
Hulstaert, Niels ;
Shofstahl, Jim ;
Sachsenberg, Timo ;
Walzer, Mathias ;
Barsnes, Harald ;
Martens, Lennart ;
Perez-Riverol, Yasset .
JOURNAL OF PROTEOME RESEARCH, 2020, 19 (01) :537-542
[10]   Identification and activation of novel biosynthetic gene clusters by genome mining in the kirromycin producer Streptomyces collinus Tu 365 [J].
Iftime, Dumitrita ;
Kulik, Andreas ;
Haertner, Thomas ;
Rohrer, Sabrina ;
Niedermeyer, Timo Horst Johannes ;
Stegmann, Evi ;
Weber, Tilmann ;
Wohlleben, Wolfgang .
JOURNAL OF INDUSTRIAL MICROBIOLOGY & BIOTECHNOLOGY, 2016, 43 (2-3) :277-291