The METLIN small molecule dataset for machine learning-based retention time prediction

被引:0
|
作者
Xavier Domingo-Almenara
Carlos Guijas
Elizabeth Billings
J. Rafael Montenegro-Burke
Winnie Uritboonthai
Aries E. Aisporna
Emily Chen
H. Paul Benton
Gary Siuzdak
机构
[1] The Scripps Research Institute,Scripps Center for Metabolomics
[2] The Scripps Research Institute,California Institute for Biomedical Research (Calibr)
[3] The Scripps Research Institute,Department of Integrative Structural and Computational Biology
[4] EURECAT – Technology Centre of Catalonia & Rovira i Virgili University joint unit,Centre for Omic Sciences
来源
关键词
D O I
暂无
中图分类号
学科分类号
摘要
Machine learning has been extensively applied in small molecule analysis to predict a wide range of molecular properties and processes including mass spectrometry fragmentation or chromatographic retention time. However, current approaches for retention time prediction lack sufficient accuracy due to limited available experimental data. Here we introduce the METLIN small molecule retention time (SMRT) dataset, an experimentally acquired reverse-phase chromatography retention time dataset covering up to 80,038 small molecules. To demonstrate the utility of this dataset, we deployed a deep learning model for retention time prediction applied to small molecule annotation. Results showed that in 70%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} of the cases, the correct molecular identity was ranked among the top 3 candidates based on their predicted retention time. We anticipate that this dataset will enable the community to apply machine learning or first principles strategies to generate better models for retention time prediction.
引用
收藏
相关论文
共 50 条
  • [21] Machine Learning-based Water Potability Prediction
    Alnaqeb, Reem
    Alrashdi, Fatema
    Alketbi, Khuloud
    Ismail, Heba
    2022 IEEE/ACS 19TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2022,
  • [22] A MACHINE LEARNING-BASED TOURIST PATH PREDICTION
    Zheng, Siwen
    Liu, Yu
    Ouyang, Zhenchao
    PROCEEDINGS OF 2016 4TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (IEEE CCIS 2016), 2016, : 38 - 42
  • [23] Machine Learning-Based Prediction of Air Quality
    Liang, Yun-Chia
    Maimury, Yona
    Chen, Angela Hsiang-Ling
    Juarez, Josue Rodolfo Cuevas
    APPLIED SCIENCES-BASEL, 2020, 10 (24): : 1 - 17
  • [24] Practical Machine Learning-Based Sepsis Prediction
    Pettinati, Michael J.
    Chen, Gengbo
    Rajput, Kuldeep Singh
    Selvaraj, Nandakumar
    42ND ANNUAL INTERNATIONAL CONFERENCES OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY: ENABLING INNOVATIVE TECHNOLOGIES FOR GLOBAL HEALTHCARE EMBC'20, 2020, : 4986 - 4991
  • [25] Machine Learning-Based Model for Grip Strength Prediction in Healthy Adults: A Nationwide Dataset-Based Study
    Park, Mina
    Kim, Yeo Hyung
    Lee, Jung Soo
    JOURNAL OF CLINICAL MEDICINE, 2025, 14 (05)
  • [26] Spatial Prediction of Apartment Rent using Regression-Based and Machine Learning-Based Approaches with a Large Dataset
    Yoshida, Takahiro
    Murakami, Daisuke
    Seya, Hajime
    JOURNAL OF REAL ESTATE FINANCE AND ECONOMICS, 2024, 69 (01): : 1 - 28
  • [27] A machine learning-based model for "In-time" prediction of periprosthetic joint infection
    Chen, Weishen
    Hu, Xuantao
    Gu, Chen
    Zhang, Zhaohui
    Zheng, Linli
    Pan, Baiqi
    Wu, Xiaoyu
    Sun, Wei
    Sheng, Puyi
    DIGITAL HEALTH, 2024, 10
  • [28] Machine Learning-Based Reconstruction and Prediction of Groundwater Time Series in the Allertal, Germany
    Tran, Tuong Vi
    Peche, Aaron
    Kringel, Robert
    Broemme, Katrin
    Altfelder, Sven
    WATER, 2025, 17 (03)
  • [29] Strategies for Imputing Missing Values and Removing Outliers in the Dataset for Machine Learning-Based Construction Cost Prediction
    Lee, Haneul
    Yun, Seokheon
    BUILDINGS, 2024, 14 (04)
  • [30] Banana and Guava dataset for machine learning and deep learning-based quality classification
    Kumari, Abiban
    Singh, Jaswinder
    DATA IN BRIEF, 2024, 57