The METLIN small molecule dataset for machine learning-based retention time prediction

被引:178
作者
Domingo-Almenara, Xavier [1 ,4 ]
Guijas, Carlos [1 ]
Billings, Elizabeth [1 ]
Montenegro-Burke, J. Rafael [1 ]
Uritboonthai, Winnie [1 ]
Aisporna, Aries E. [1 ]
Chen, Emily [2 ]
Benton, H. Paul [1 ]
Siuzdak, Gary [1 ,3 ]
机构
[1] Scripps Res Inst, Scripps Ctr Metabol, La Jolla, CA 92037 USA
[2] Scripps Res Inst, Calif Inst Biomed Res Calibr, La Jolla, CA 92037 USA
[3] Scripps Res Inst, Dept Integrat Struct & Computat Biol, La Jolla, CA 92037 USA
[4] EURECAT Technol Ctr Catalonia & Rovira & Virgili, Ctr Omic Sci, Reus, Catalonia, Spain
基金
美国国家卫生研究院;
关键词
METABOLITE IDENTIFICATION; DIFFERENT GRADIENTS; WEB SERVER; FLOW-RATES; LIQUID; ANNOTATION; PROJECTION; SPECTRUM; MODELS;
D O I
10.1038/s41467-019-13680-7
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70% of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.
引用
收藏
页数:9
相关论文
共 49 条
[1]   Development and application of retention time prediction models in the suspect and non-target screening of emerging contaminants [J].
Aalizadeh, Reza ;
Nika, Maria-Christina ;
Thomaidis, Nikolaos S. .
JOURNAL OF HAZARDOUS MATERIALS, 2019, 363 :277-285
[2]   Retention projection enables accurate calculation of liquid chromatographic retention times across labs and methods [J].
Abate-Pella, Daniel ;
Freund, Dana M. ;
Ma, Yan ;
Simon-Manso, Yamil ;
Hollender, Juliane ;
Broeckling, Corey D. ;
Huhman, David V. ;
Krokhin, Oleg V. ;
Stoll, Dwight R. ;
Hegeman, Adrian D. ;
Kind, Tobias ;
Fiehn, Oliver ;
Schymanski, Emma L. ;
Prenni, Jessica E. ;
Sumner, Lloyd W. ;
Boswell, Paul G. .
JOURNAL OF CHROMATOGRAPHY A, 2015, 1412 :43-51
[3]   Retention Time Prediction Improves Identification in Nontargeted Lipidomics Approaches [J].
Aicheler, Fabian ;
Li, Jia ;
Hoene, Miriam ;
Lehmann, Rainer ;
Xu, Guowang ;
Kohlbacher, Oliver .
ANALYTICAL CHEMISTRY, 2015, 87 (15) :7698-7704
[4]   CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra [J].
Allen, Felicity ;
Pon, Allison ;
Wilson, Michael ;
Greiner, Russ ;
Wishart, David .
NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) :W94-W99
[5]   A Simple Representation of Three-Dimensional Molecular Structure [J].
Axen, Seth D. ;
Huang, Xi-Ping ;
Caceres, Elena L. ;
Gendelev, Leo ;
Roth, Bryan L. ;
Keiser, Michael J. .
JOURNAL OF MEDICINAL CHEMISTRY, 2017, 60 (17) :7393-7409
[6]   Liquid-chromatography retention order prediction for metabolite identification [J].
Bach, Eric ;
Szedmak, Sandor ;
Brouard, Celine ;
Boecker, Sebastian ;
Rousu, Juho .
BIOINFORMATICS, 2018, 34 (17) :875-883
[7]   Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations? [J].
Bajusz, David ;
Racz, Anita ;
Heberger, Kroly .
JOURNAL OF CHEMINFORMATICS, 2015, 7
[8]   Structure Annotation of All Mass Spectra in Untargeted Metabolomics [J].
Blazenovic, Ivana ;
Kind, Tobias ;
Sa, Michael R. ;
Ji, Jian ;
Vaniya, Arpana ;
Wancewicz, Benjamin ;
Roberts, Bryan S. ;
Torbasinovic, Hrvoje ;
Lee, Tack ;
Mehta, Sajjan S. ;
Showalter, Megan R. ;
Song, Hosook ;
Kwok, Jessica ;
Jahn, Dieter ;
Kim, Jayoung ;
Fiehn, Oliver .
ANALYTICAL CHEMISTRY, 2019, 91 (03) :2155-2162
[9]   Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics [J].
Blazenovic, Ivana ;
Kind, Tobias ;
Ji, Jian ;
Fiehn, Oliver .
METABOLITES, 2018, 8 (02)
[10]   A study on retention "projection" as a supplementary means for compound identification by liquid chromatography-mass spectrometry capable of predicting retention with different gradients, flow rates, and instruments [J].
Boswell, Paul G. ;
Schellenberg, Jonathan R. ;
Carr, Peter W. ;
Cohen, Jerry D. ;
Hegeman, Adrian D. .
JOURNAL OF CHROMATOGRAPHY A, 2011, 1218 (38) :6732-6741