MS2DeepScore: a novel deep learning similarity measure to compare tandem mass spectra

被引:81
作者
Huber, Florian [1 ]
van der Burg, Sven [1 ]
van der Hooft, Justin J. J. [2 ]
Ridder, Lars [1 ]
机构
[1] Netherlands eSci Ctr, NL-1098 XG Amsterdam, Netherlands
[2] Wageningen Univ, Bioinformat Grp, NL-6708 PB Wageningen, Netherlands
关键词
Mass spectrometry; Metabolomics; Spectral similarity measure; Supervised machine learning; Deep learning; DATABASES;
D O I
10.1186/s13321-021-00558-4
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Mass spectrometry data is one of the key sources of information in many workflows in medicine and across the life sciences. Mass fragmentation spectra are generally considered to be characteristic signatures of the chemical compound they originate from, yet the chemical structure itself usually cannot be easily deduced from the spectrum. Often, spectral similarity measures are used as a proxy for structural similarity but this approach is strongly limited by a generally poor correlation between both metrics. Here, we propose MS2DeepScore: a novel Siamese neural network to predict the structural similarity between two chemical structures solely based on their MS/MS fragmentation spectra. Using a cleaned dataset of > 100,000 mass spectra of about 15,000 unique known compounds, we trained MS2DeepScore to predict structural similarity scores for spectrum pairs with high accuracy. In addition, sampling different model varieties through Monte-Carlo Dropout is used to further improve the predictions and assess the model's prediction uncertainty. On 3600 spectra of 500 unseen compounds, MS2DeepScore is able to identify highly-reliable structural matches and to predict Tanimoto scores for pairs of molecules based on their fragment spectra with a root mean squared error of about 0.15. Furthermore, the prediction uncertainty estimate can be used to select a subset of predictions with a root mean squared error of about 0.1. Furthermore, we demonstrate that MS2DeepScore outperforms classical spectral similarity measures in retrieving chemically related compound pairs from large mass spectral datasets, thereby illustrating its potential for spectral library matching. Finally, MS2DeepScore can also be used to create chemically meaningful mass spectral embeddings that could be used to cluster large numbers of spectra. Added to the recently introduced unsupervised Spec2Vec metric, we believe that machine learning-supported mass spectral similarity measures have great potential for a range of metabolomics data processing pipelines.
引用
收藏
页数:14
相关论文
共 46 条
  • [1] Auto-deconvolution and molecular networking of gas chromatography-mass spectrometry data
    Aksenov, Alexander A.
    Laponogov, Ivan
    Zhang, Zheng
    Doran, Sophie L. F.
    Belluomo, Ilaria
    Veselkov, Dennis
    Bittremieux, Wout
    Nothias, Louis Felix
    Nothias-Esposito, Melissa
    Maloney, Katherine N.
    Misra, Biswapriya B.
    Melnik, Alexey V.
    Smirnov, Aleksandr
    Du, Xiuxia
    Jones, Kenneth L., II
    Dorrestein, Kathleen
    Panitchpakdi, Morgan
    Ernst, Madeleine
    van der Hooft, Justin J. J.
    Gonzalez, Mabel
    Carazzone, Chiara
    Amezquita, Adolfo
    Callewaert, Chris
    Morton, James T.
    Quinn, Robert A.
    Bouslimani, Amina
    Orio, Andrea Albarracin
    Petras, Daniel
    Smania, Andrea M.
    Couvillion, Sneha P.
    Burnet, Meagan C.
    Nicora, Carrie D.
    Zink, Erika
    Metz, Thomas O.
    Artaev, Viatcheslav
    Humston-Fulmer, Elizabeth
    Gregor, Rachel
    Meijler, Michael M.
    Mizrahi, Itzhak
    Eyal, Stav
    Anderson, Brooke
    Dutton, Rachel
    Lugan, Raphael
    Le Boulch, Pauline
    Guitton, Yann
    Prevost, Stephanie
    Poirier, Audrey
    Dervilly, Gaud
    Le Bizec, Bruno
    Fait, Aaron
    [J]. NATURE BIOTECHNOLOGY, 2021, 39 (02) : 169 - 173
  • [2] Reproducible molecular networking of untargeted mass spectrometry data using GNPS
    Aron, Allegra T.
    Gentry, Emily C.
    McPhail, Kerry L.
    Nothias, Louis-Felix
    Nothias-Esposito, Melissa
    Bouslimani, Amina
    Petras, Daniel
    Gauglitz, Julia M.
    Sikora, Nicole
    Vargas, Fernando
    van Der Hooft, Justin J. J.
    Ernst, Madeleine
    Bin Kang, Kyo
    Aceves, Christine M.
    Caraballo-Rodriguez, Andres Mauricio
    Koester, Irina
    Weldon, Kelly C.
    Bertrand, Samuel
    Roullier, Catherine
    Sun, Kunyang
    Tehan, Richard M.
    Boya P, Cristopher A.
    Christian, Martin H.
    Gutierrez, Marcelino
    Ulloa, Aldo Moreno
    Mora, Javier Andres Tejeda
    Mojica-Flores, Randy
    Lakey-Beitia, Johant
    Vasquez-Chaves, Victor
    Zhang, Yilue
    Calderon, Angela, I
    Tayler, Nicole
    Keyzers, Robert A.
    Tugizimana, Fidele
    Ndlovu, Nombuso
    Aksenov, Alexander A.
    Jarmusch, Alan K.
    Schmid, Robin
    Truman, Andrew W.
    Bandeira, Nuno
    Wang, Mingxun
    Dorrestein, Pieter C.
    [J]. NATURE PROTOCOLS, 2020, 15 (06) : 1954 - 1991
  • [3] Why is Tanimoto index an appropriate choice for fingerprint-based similarity calculations?
    Bajusz, David
    Racz, Anita
    Heberger, Kroly
    [J]. JOURNAL OF CHEMINFORMATICS, 2015, 7
  • [4] How Similar Are Similarity Searching Methods? A Principal Component Analysis of Molecular Descriptor Space
    Bender, Andreas
    Jenkins, Jeremy L.
    Scheiber, Josef
    Sukuru, Sai Chelan K.
    Glick, Meir
    Davies, John W.
    [J]. JOURNAL OF CHEMICAL INFORMATION AND MODELING, 2009, 49 (01) : 108 - 119
  • [5] Bertoni M., 2020, BIOACTIVITY DESCRIPT, DOI [10.1101/2020.07.21.214197v2, DOI 10.1101/2020.07.21.214197V2]
  • [6] Software Tools and Approaches for Compound Identification of LC-MS/MS Data in Metabolomics
    Blazenovic, Ivana
    Kind, Tobias
    Ji, Jian
    Fiehn, Oliver
    [J]. METABOLITES, 2018, 8 (02):
  • [7] Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
  • [8] One molecular fingerprint to rule them all: drugs, biomolecules, and the metabolome
    Capecchi, Alice
    Probst, Daniel
    Reymond, Jean-Louis
    [J]. JOURNAL OF CHEMINFORMATICS, 2020, 12 (01)
  • [9] Illuminating the dark matter in metabolomics
    da Silva, Ricardo R.
    Dorrestein, Pieter C.
    Quinn, Robert A.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (41) : 12549 - 12550
  • [10] Spectral similarity versus structural similarity: mass spectrometry
    Demuth, W
    Karlovits, M
    Varmuza, K
    [J]. ANALYTICA CHIMICA ACTA, 2004, 516 (1-2) : 75 - 85