Fast metabolite identification with Input Output Kernel Regression

被引:68
作者
Brouard, Celine [1 ,2 ]
Shen, Huibin [1 ,2 ]
Duehrkop, Kai [3 ]
d'Alche-Buc, Florence [4 ]
Boecker, Sebastian [3 ]
Rousu, Juho [1 ,2 ]
机构
[1] Aalto Univ, Dept Comp Sci, Espoo, Finland
[2] Helsinki Inst Informat Technol, Espoo, Finland
[3] Univ Jena, Chair Bioinformat, Jena, Germany
[4] Univ Paris Saclay, Telecom ParisTech, CNRS, LTCI, Paris, France
关键词
FRAGMENTATION; PREDICTION; ANNOTATION; DATABASE;
D O I
10.1093/bioinformatics/btw246
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: An important problematic of metabolomics is to identify metabolites using tandem mass spectrometry data. Machine learning methods have been proposed recently to solve this problem by predicting molecular fingerprint vectors and matching these fingerprints against existing molecular structure databases. In this work we propose to address the metabolite identification problem using a structured output prediction approach. This type of approach is not limited to vector output space and can handle structured output space such as the molecule space. Results: We use the Input Output Kernel Regression method to learn the mapping between tandem mass spectra and molecular structures. The principle of this method is to encode the similarities in the input (spectra) space and the similarities in the output (molecule) space using two kernel functions. This method approximates the spectra-molecule mapping in two phases. The first phase corresponds to a regression problem from the input space to the feature space associated to the output kernel. The second phase is a preimage problem, consisting in mapping back the predicted output feature vectors to the molecule space. We show that our approach achieves state-of-the-art accuracy in metabolite identification. Moreover, our method has the advantage of decreasing the running times for the training step and the test step by several orders of magnitude over the preceding methods.
引用
收藏
页码:28 / 36
页数:9
相关论文
共 34 条
[1]   Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification [J].
Allen, Felicity ;
Greiner, Russ ;
Wishart, David .
METABOLOMICS, 2015, 11 (01) :98-110
[2]   CFM-ID: a web server for annotation, spectrum prediction and metabolite identification from tandem mass spectra [J].
Allen, Felicity ;
Pon, Allison ;
Wilson, Michael ;
Greiner, Russ ;
Wishart, David .
NUCLEIC ACIDS RESEARCH, 2014, 42 (W1) :W94-W99
[3]  
[Anonymous], 2011, INT C MACH LEARN ICM
[4]   Towards de novo identification of metabolites by analyzing tandem mass spectra [J].
Boecker, Sebastian ;
Rasche, Florian .
BIOINFORMATICS, 2008, 24 (16) :I49-I55
[5]  
Bolton EE, 2010, ANN REP COMP CHEM, V4, P217, DOI 10.1016/S1574-1400(08)00012-1
[6]  
Brouard C., 2015, HAL01216708
[7]  
Cortes C., 2005, INT C MACH LEARN ICM, V119, P153, DOI DOI 10.1145/1102351.1102371
[8]  
Cortes C, 2012, J MACH LEARN RES, V13, P795
[9]   Illuminating the dark matter in metabolomics [J].
da Silva, Ricardo R. ;
Dorrestein, Pieter C. ;
Quinn, Robert A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (41) :12549-12550
[10]   Searching molecular structure databases with tandem mass spectra using CSI:FingerID [J].
Duehrkop, Kai ;
Shen, Huibin ;
Meusel, Marvin ;
Rousu, Juho ;
Boecker, Sebastian .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (41) :12580-12585