UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling

被引:0
作者
Arnaud M. Wolfer
Sylvain Lozano
Thierry Umbdenstock
Vincent Croixmarie
Alban Arrault
Philippe Vayer
机构
[1] Technologie Servier,Section of Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine
[2] Imperial College,undefined
来源
Metabolomics | 2016年 / 12卷
关键词
UPLC–MS; Retention time prediction; Support vector regression; Random forest; Self-organizing maps;
D O I
暂无
中图分类号
学科分类号
摘要
Metabolic profiling focuses on the analysis of a wide range of small endogenous molecules in order to understand the response of a living system to perturbations. Ultra high performance liquid chromatography–mass spectrometry is a widely employed profiling tool, but its application is limited by difficulties in identification of detected metabolites. Herein, we demonstrate how the prediction of retention time can help resolve this major issue. We describe a general approach that enables the generation of reliable quantitative structure retention relationship models tailored to specific chromatographic protocols. This methodology, applied to 442 experimentally characterised standards, employs a combination of random forest and support vector regression models with molecular interaction descriptors. In this unusual application, the Volsurf + molecular descriptors demonstrated a high ability to describe chromatographic retention. On external validation sets, and for a wide range of chemical classes, predicted values were in average within 13 % of the experimentally observed retention time. More importantly, the presented procedure reduced by more than 80 % the number of false putative identification, greatly improving metabolite identification. Furthermore, in 95 % of cases, the correct identification was promoted within the top three metabolite suggestions. This retention time prediction framework can be replicated by different laboratories to suit their profiling platforms and enhance the value of standard library by providing a new tool for compound identification.
引用
收藏
相关论文
共 172 条
[1]  
Almeida TMG(2005)The molecular retention mechanism in reversed-phase liquid chromatography of meso-ionic compounds by quantitative structure-retention relationships (QSRR) Chemistry & Biodiversity 2 1691-1700
[2]  
Leitão A(2009)Predictions of peptides’ retention times in reversed-phase liquid chromatography as a new supportive tool to improve protein identification in proteomics Proteomics 9 835-847
[3]  
Montanari MLC(2007)Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts Nature Protocols 2 2692-2703
[4]  
Montanari CA(2009)KNIME—the Konstanz information miner: Version 2.0 and beyond SIGKDD Explorations 11 26-31
[5]  
Baczek T(2011)Easy and accurate high-performance liquid chromatography retention prediction with different gradients, flow rates, and instruments by back-calculation of gradient and flow rate profiles Journal of Chromatography A 1218 6742-6749
[6]  
Kaliszan R(2011)A study on retention “projection” as a supplementary means for compound identification by liquid chromatography-mass spectrometry capable of predicting retention with different gradients, flow rates, and instruments Journal of Chromatography A 1218 6732-6741
[7]  
Beckonert O(2001)Random Forests Machine Learning 45 5-32
[8]  
Keun HC(2011)LIBSVM: A library for support vector machines ACM Transactions on Intelligent Systems and Technology 2 1-27
[9]  
Ebbels TMD(2011)Toward global metabolomics analysis with hydrophilic interaction liquid chromatography–mass spectroscopy: Improved metabolite identification by retention time prediction Analytical Chemistry 83 8703-8710
[10]  
Bundy J(2007)Untargeted large-scale plant metabolomics using liquid chromatography coupled to mass spectrometry Nature Protocols 2 778-791