UPLC–MS retention time prediction: a machine learning approach to metabolite identification in untargeted profiling

被引:0
作者
Arnaud M. Wolfer
Sylvain Lozano
Thierry Umbdenstock
Vincent Croixmarie
Alban Arrault
Philippe Vayer
机构
[1] Technologie Servier,Section of Computational and Systems Medicine, Department of Surgery and Cancer, Faculty of Medicine
[2] Imperial College,undefined
来源
Metabolomics | 2016年 / 12卷
关键词
UPLC–MS; Retention time prediction; Support vector regression; Random forest; Self-organizing maps;
D O I
暂无
中图分类号
学科分类号
摘要
Metabolic profiling focuses on the analysis of a wide range of small endogenous molecules in order to understand the response of a living system to perturbations. Ultra high performance liquid chromatography–mass spectrometry is a widely employed profiling tool, but its application is limited by difficulties in identification of detected metabolites. Herein, we demonstrate how the prediction of retention time can help resolve this major issue. We describe a general approach that enables the generation of reliable quantitative structure retention relationship models tailored to specific chromatographic protocols. This methodology, applied to 442 experimentally characterised standards, employs a combination of random forest and support vector regression models with molecular interaction descriptors. In this unusual application, the Volsurf + molecular descriptors demonstrated a high ability to describe chromatographic retention. On external validation sets, and for a wide range of chemical classes, predicted values were in average within 13 % of the experimentally observed retention time. More importantly, the presented procedure reduced by more than 80 % the number of false putative identification, greatly improving metabolite identification. Furthermore, in 95 % of cases, the correct identification was promoted within the top three metabolite suggestions. This retention time prediction framework can be replicated by different laboratories to suit their profiling platforms and enhance the value of standard library by providing a new tool for compound identification.
引用
收藏
相关论文
共 172 条
[21]  
Boswell PG(2005)QSPR prediction of GC retention indices for nitrogen-containing polycyclic aromatic compounds from heuristically computed molecular descriptors Talanta 68 31-39
[22]  
Schellenberg JR(2004)Use of computer-assisted methods for the modeling of the retention time of a variety of volatile organic compounds: a PCA-MLR-ANN approach Journal of Chemical Information and Computer Sciences 44 1328-1335
[23]  
Carr PW(2007)QSRR: Quantitative structure-(chromatographic) retention relationships Chemical Reviews 107 3212-3246
[24]  
Cohen JD(2010)Advances in structure elucidation of small molecules using mass spectrometry Bioanalytical Reviews 2 23-60
[25]  
Hegeman AD(1990)The self-organizing map Proceedings of the IEEE 78 1464-1480
[26]  
Boswell PG(2008)Chemometric studies on brain-uptake of PET agents via VolSurf analysis Bulletin-Korean Chemical Society 29 61-68
[27]  
Schellenberg JR(2005)Prediction of retention time of a variety of volatile organic compounds based on the heuristic method and support vector machine Analytica Chimica Acta 537 101-110
[28]  
Carr PW(2009)Automated procedure for candidate compound selection in GC-MS metabolomics based on prediction of Kovats retention index Bioinformatics 25 787-794
[29]  
Cohen JD(1999)“Metabonomics”: Understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data Xenobiotica 29 1181-1189
[30]  
Hegeman AD(2006)A bioinformatician’s view of the metabolome BioEssays 28 534-545