Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics

被引:169
作者
Bonini, Paolo [2 ]
Kind, Tobias [1 ]
Tsugawa, Hiroshi [3 ,4 ]
Barupal, Dinesh Kumar [1 ]
Fiehn, Oliver [1 ]
机构
[1] Univ Calif Davis, West Coast Metabol Ctr, UC Davis Genome Ctr, Davis, CA 95616 USA
[2] NGAlab, Tarragona 43762, Spain
[3] RIKEN, Ctr Sustainable Resource Sci, Yokohama, Kanagawa 2300045, Japan
[4] RIKEN, Ctr Integrat Med Sci, Yokohama, Kanagawa 2300045, Japan
基金
美国国家卫生研究院;
关键词
LIQUID; SUSPECT; MODELS;
D O I
10.1021/acs.analchem.9b05765
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Unidentified peaks remain a major problem in untargeted metabolomics by LC-MS/MS. Confidence in peak annotations increases by combining MS/MS matching and retention time. We here show how retention times can be predicted from molecular structures. Two large, publicly available data sets were used for model training in machine learning: the Fiehn hydrophilic interaction liquid chromatography data set (HILIC) of 981 primary metabolites and biogenic amines,and the RIKEN plant specialized metabolome annotation (PlaSMA) database of 852 secondary metabolites that uses reversed-phase liquid chromatography (RPLC). Five different machine learning algorithms have been integrated into the Retip R package: the random forest, Bayesian-regularized neural network, XGBoost, light gradient-boosting machine (LightGBM), and Keras algorithms for building the retention time prediction models. A complete workflow for retention time prediction was developed in R. It can be freely downloaded from the GitHub repository (https://www.retip.app). Keras outperformed other machine learning algorithms in the test set with minimum overfitting, verified by small error differences between training, test, and validation sets. Keras yielded a mean absolute error of 0.78 min for HILIC and 0.57 min for RPLC. Retip is integrated into the mass spectrometry software tools MS-DIAL and MS-FINDER, allowing a complete compound annotation workflow. In a test application on mouse blood plasma samples, we found a 68% reduction in the number of candidate structures when searching all isomers in MS-FINDER compound identification software. Retention time prediction increases the identification rate in liquid chromatography and subsequently leads to an improved biological interpretation of metabolomics data.
引用
收藏
页码:7515 / 7522
页数:8
相关论文
共 35 条
[1]   Development and application of retention time prediction models in the suspect and non-target screening of emerging contaminants [J].
Aalizadeh, Reza ;
Nika, Maria-Christina ;
Thomaidis, Nikolaos S. .
JOURNAL OF HAZARDOUS MATERIALS, 2019, 363 :277-285
[2]  
[Anonymous], 2017, LightGBM: A highly efficient gradient boosting decision tree
[3]  
[Anonymous], ery and Data Mining, DOI DOI 10.1145/2939672.2939785
[4]   Gradient liquid chromatographic retention time prediction for suspect screening applications: A critical assessment of a generalised artificial neural network-based approach across 10 multi-residue reversed-phase analytical methods [J].
Barron, Leon P. ;
McEneff, Gillian L. .
TALANTA, 2016, 147 :261-270
[5]   A Comprehensive Plasma Metabolomics Dataset for a Cohort of Mouse Knockouts within the International Mouse Phenotyping Consortium [J].
Barupal, Dinesh K. ;
Zhang, Ying ;
Shen, Tong ;
Fan, Sili ;
Roberts, Bryan S. ;
Fitzgerald, Patrick ;
Wancewicz, Benjamin ;
Valdiviez, Luis ;
Wohlgemuth, Gert ;
Byram, Gregory ;
Choy, Ying Yng ;
Haffner, Bennett ;
Showalter, Megan R. ;
Vaniya, Arpana ;
Bloszies, Clayton S. ;
Folz, Jacob S. ;
Kind, Tobias ;
Flenniken, Ann M. ;
McKerlie, Colin ;
Nutter, Lauryl M. J. ;
Lloyd, Kent C. ;
Fiehn, Oliver .
METABOLITES, 2019, 9 (05)
[6]   Structure Annotation of All Mass Spectra in Untargeted Metabolomics [J].
Blazenovic, Ivana ;
Kind, Tobias ;
Sa, Michael R. ;
Ji, Jian ;
Vaniya, Arpana ;
Wancewicz, Benjamin ;
Roberts, Bryan S. ;
Torbasinovic, Hrvoje ;
Lee, Tack ;
Mehta, Sajjan S. ;
Showalter, Megan R. ;
Song, Hosook ;
Kwok, Jessica ;
Jahn, Dieter ;
Kim, Jayoung ;
Fiehn, Oliver .
ANALYTICAL CHEMISTRY, 2019, 91 (03) :2155-2162
[7]   Increasing Compound Identification Rates in Untargeted Lipidomics Research with Liquid Chromatography Drift Time-Ion Mobility Mass Spectrometry [J].
Blazenovic, Ivana ;
Shen, Tong ;
Mehta, Sajjan S. ;
Kind, Tobias ;
Ji, Jian ;
Piparo, Marco ;
Cacciola, Francesco ;
Mondello, Luigi ;
Fiehn, Oliver .
ANALYTICAL CHEMISTRY, 2018, 90 (18) :10758-10764
[8]   Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics [J].
Blazenovic, Ivana ;
Kind, Tobias ;
Ji, Jian ;
Fiehn, Oliver .
METABOLITES, 2018, 8 (02)
[9]   Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction [J].
Bouwmeester, Robbin ;
Martens, Lennart ;
Degroeve, Sven .
ANALYTICAL CHEMISTRY, 2019, 91 (05) :3694-3703
[10]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32