Joint structural annotation of small molecules using liquid chromatography retention order and tandem mass spectrometry data

被引:26
作者
Bach, Eric [1 ]
Schymanski, Emma L. [2 ]
Rousu, Juho [1 ]
机构
[1] Aalto Univ, Dept Comp Sci, Espoo, Finland
[2] Univ Luxembourg, Luxembourg Ctr Syst Biomed LCSB, Belvaux, Luxembourg
基金
芬兰科学院; 美国国家卫生研究院;
关键词
METABOLITE IDENTIFICATION; TIME; INFORMATION; PREDICTION; CLASSIFICATION; SMILIB; V2.0;
D O I
10.1038/s42256-022-00577-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Structural annotation of small molecules in biological samples remains a key bottleneck in untargeted metabolomics, despite rapid progress in predictive methods and tools during the past decade. Liquid chromatography-tandem mass spectrometry, one of the most widely used analysis platforms, can detect thousands of molecules in a sample, the vast majority of which remain unidentified even with best-of-class methods. Here we present LC-MS(2)Struct, a machine learning framework for structural annotation of small-molecule data arising from liquid chromatography-tandem mass spectrometry (LC-MS2) measurements. LC-MS(2)Struct jointly predicts the annotations for a set of mass spectrometry features in a sample, using a novel structured prediction model trained to optimally combine the output of state-of-the-art MS2 scorers and observed retention orders. We evaluate our method on a dataset covering all publicly available reversed-phase LC-MS2 data in the MassBank reference database, including 4,327 molecules measured using 18 different LC conditions from 16 contributors, greatly expanding the chemical analytical space covered in previous multi-MS2 scorer evaluations. LC-MS(2)Struct obtains significantly higher annotation accuracy than earlier methods and improves the annotation accuracy of state-of-the-art MS2 scorers by up to 106%. The use of stereochemistry-aware molecular fingerprints improves prediction performance, which highlights limitations in existing approaches and has strong implications for future computational LC-MS2 developments.
引用
收藏
页码:1224 / +
页数:19
相关论文
共 73 条
[1]   Retention Time Prediction Improves Identification in Nontargeted Lipidomics Approaches [J].
Aicheler, Fabian ;
Li, Jia ;
Hoene, Miriam ;
Lehmann, Rainer ;
Xu, Guowang ;
Kohlbacher, Oliver .
ANALYTICAL CHEMISTRY, 2015, 87 (15) :7698-7704
[2]   Global chemical analysis of biology by mass spectrometry [J].
Aksenov, Alexander A. ;
da Silva, Ricardo ;
Knight, Rob ;
Lopes, Norberto P. ;
Dorrestein, Pieter C. .
NATURE REVIEWS CHEMISTRY, 2017, 1 (07)
[3]   Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification [J].
Allen, Felicity ;
Greiner, Russ ;
Wishart, David .
METABOLOMICS, 2015, 11 (01) :98-110
[4]  
[Anonymous], 2013, INT C MACH LEARN
[5]  
Bach E., 2022, MASSBANK2DB BUILD MA
[6]  
Bach E., 2022, RETENTION ORDER SUPP
[7]  
Bach E., 2021, MSMSRT SCORER PROBAB
[8]  
Bach Eric, 2022, Zenodo, DOI 10.5281/ZENODO.6451016
[9]  
Bach Eric, 2022, Zenodo, DOI 10.5281/ZENODO.5854661
[10]  
Bach Eric, 2022, Zenodo, DOI 10.5281/ZENODO.6037629