Metabolite identification and molecular fingerprint prediction through machine learning

被引:129
作者
Heinonen, Markus [1 ,2 ]
Shen, Huibin [1 ]
Zamboni, Nicola [3 ]
Rousu, Juho [3 ,4 ]
机构
[1] Univ Helsinki, Dept Comp Sci, Helsinki 00014, Finland
[2] Helsinki Inst Informat Technol, Helsinki, Finland
[3] ETH, Dept Biol, Inst Mol Syst Biol, CH-8093 Zurich, Switzerland
[4] Aalto Univ, Dept Informat & Comp Sci, Espoo 00076, Finland
基金
芬兰科学院;
关键词
MASS-SPECTROMETRY; PROBABILITIES; METABOLOMICS; INFORMATION; SPECTRA; SEARCH;
D O I
10.1093/bioinformatics/bts437
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: Metabolite identification from tandem mass spectra is an important problem in metabolomics, underpinning subsequent metabolic modelling and network analysis. Yet, currently this task requires matching the observed spectrum against a database of reference spectra originating from similar equipment and closely matching operating parameters, a condition that is rarely satisfied in public repositories. Furthermore, the computational support for identification of molecules not present in reference databases is lacking. Recent efforts in assembling large public mass spectral databases such as MassBank have opened the door for the development of a new genre of metabolite identification methods. Results: We introduce a novel framework for prediction of molecular characteristics and identification of metabolites from tandem mass spectra using machine learning with the support vector machine. Our approach is to first predict a large set of molecular properties of the unknown metabolite from salient tandem mass spectral signals, and in the second step to use the predicted properties for matching against large molecule databases, such as PubChem. We demonstrate that several molecular properties can be predicted to high accuracy and that they are useful in de novo metabolite identification, where the reference database does not contain any spectra of the same molecule.
引用
收藏
页码:2333 / 2341
页数:9
相关论文
共 22 条
  • [1] [Anonymous], 2004, KERNEL METHODS PATTE
  • [2] [Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
  • [3] Bakir G.H., 2007, Predicting structured data
  • [4] Bocker S., 2009, BIOINFOMATICS, V25, P1
  • [5] Curry B., 1992, MSnet: a neural network that classifies mass spectra, DOI 10.1016/0898-5529(90)90053-B
  • [6] COMPUTER-AIDED INTERPRETATION OF MASS-SPECTRA .9. INFORMATION ON SUBSTRUCTURAL PROBABILITIES FROM STIRS
    DAYRINGER, HE
    PESYNA, GM
    VENKATARAGHAVAN, R
    MCLAFFERTY, FW
    [J]. ORGANIC MASS SPECTROMETRY, 1976, 11 (05): : 529 - 542
  • [7] Identification of bacteria using tandem mass spectrometry combined with a proteome database and statistical scoring
    Dworzanski, JP
    Snyder, AP
    Chen, R
    Zhang, HY
    Wishart, D
    Li, L
    [J]. ANALYTICAL CHEMISTRY, 2004, 76 (08) : 2355 - 2366
  • [8] FiD:: a software for ab initio structural identification of product ions from tandem mass spectrometric data
    Heinonen, Markus
    Rantanen, Ari
    Mielikaeinen, Taneli
    Kokkonen, Juha
    Kiuru, Jari
    Ketola, Raimo A.
    Rousu, Juho
    [J]. RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2008, 22 (19) : 3043 - 3052
  • [9] MassBank: a public repository for sharing mass spectral data for life sciences
    Horai, Hisayuki
    Arita, Masanori
    Kanaya, Shigehiko
    Nihei, Yoshito
    Ikeda, Tasuku
    Suwa, Kazuhiro
    Ojima, Yuya
    Tanaka, Kenichi
    Tanaka, Satoshi
    Aoshima, Ken
    Oda, Yoshiya
    Kakazu, Yuji
    Kusano, Miyako
    Tohge, Takayuki
    Matsuda, Fumio
    Sawada, Yuji
    Hirai, Masami Yokota
    Nakanishi, Hiroki
    Ikeda, Kazutaka
    Akimoto, Naoshige
    Maoka, Takashi
    Takahashi, Hiroki
    Ara, Takeshi
    Sakurai, Nozomu
    Suzuki, Hideyuki
    Shibata, Daisuke
    Neumann, Steffen
    Iida, Takashi
    Tanaka, Ken
    Funatsu, Kimito
    Matsuura, Fumito
    Soga, Tomoyoshi
    Taguchi, Ryo
    Saito, Kazuki
    Nishioka, Takaaki
    [J]. JOURNAL OF MASS SPECTROMETRY, 2010, 45 (07): : 703 - 714
  • [10] Jebara T, 2004, J MACH LEARN RES, V5, P819