MolDiscovery: learning mass spectrometry fragmentation of small molecules

被引:56
作者
Cao, Liu [1 ]
Guler, Mustafa [1 ]
Tagirdzhanov, Azat [2 ,3 ]
Lee, Yi-Yuan [1 ]
Gurevich, Alexey [2 ]
Mohimani, Hosein [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
[2] St Petersburg State Univ, St Petersburg, Russia
[3] St Petersburg Electrotech Univ LETI, St Petersburg, Russia
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
COMPREHENSIVE RESOURCE; DISCOVERY; DATABASES; SPECTRA; GENOMICS; SEARCH; ACCESS; IMPACT;
D O I
10.1038/s41467-021-23986-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Identification of small molecules is a critical task in various areas of life science. Recent advances in mass spectrometry have enabled the collection of tandem mass spectra of small molecules from hundreds of thousands of environments. To identify which molecules are present in a sample, one can search mass spectra collected from the sample against millions of molecular structures in small molecule databases. The existing approaches are based on chemistry domain knowledge, and they fail to explain many of the peaks in mass spectra of small molecules. Here, we present molDiscovery, a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by learning a probabilistic model to match small molecules with their mass spectra. A search of over 8 million spectra from the Global Natural Product Social molecular networking infrastructure shows that molDiscovery correctly identify six times more unique small molecules than previous methods. A large number of mass spectra from different samples have been collected, and to identify small molecules from these spectra, database searches are needed, which is challenging. Here, the authors report molDiscovery, a mass spectral database search method that uses an algorithm to generate mass spectrometry fragmentations and learns a probabilistic model to match small molecules with their mass spectra.
引用
收藏
页数:13
相关论文
共 58 条
  • [1] KNApSAcK Family Databases: Integrated Metabolite-Plant Species Databases for Multifaceted Plant Research
    Afendi, Farit Mochamad
    Okada, Taketo
    Yamazaki, Mami
    Hirai-Morita, Aki
    Nakamura, Yukiko
    Nakamura, Kensuke
    Ikeda, Shun
    Takahashi, Hiroki
    Altaf-Ul-Amin, Md.
    Darusman, Latifah K.
    Saito, Kazuki
    Kanaya, Shigehiko
    [J]. PLANT AND CELL PHYSIOLOGY, 2012, 53 (02) : e1
  • [2] Competitive fragmentation modeling of ESI-MS/MS spectra for putative metabolite identification
    Allen, Felicity
    Greiner, Russ
    Wishart, David
    [J]. METABOLOMICS, 2015, 11 (01) : 98 - 110
  • [3] [Anonymous], 2019, NUCLEIC ACIDS RES, DOI DOI 10.1093/NAR/GKY1033
  • [4] Buckingham J., 1997, Dictionary of natural products, supplement 4, V11
  • [5] Cao L., SOURCE DATA MOLDISCO, DOI [10.5281/zenodo.4680231, DOI 10.5281/ZENODO.4680231]
  • [6] MetaMiner: A Scalable Peptidogenomics Approach for Discovery of Ribosomal Peptide Natural Products with Blind Modifications from Microbial Communities
    Cao, Liu
    Gurevich, Alexey
    Alexander, Kelsey L.
    Naman, C. Benjamin
    Leao, Tiago
    Glukhov, Evgenia
    Luzzatto-Knaan, Tal
    Vargas, Fernando
    Quinn, Robby
    Bouslimani, Amina
    Nothias, Louis Felix
    Singh, Nitin K.
    Sanders, Jon G.
    Benitez, Rodolfo A. S.
    Thompson, Luke R.
    Hamid, Md-Nafiz
    Morton, James T.
    Mikheenko, Alla
    Shlemov, Alexander
    Korobeynikov, Anton
    Friedberg, Iddo
    Knight, Rob
    Venkateswaran, Kasthuri
    Gerwick, William H.
    Gerwick, Lena
    Dorrestein, Pieter C.
    Pevzner, Pavel A.
    Mohimani, Hosein
    [J]. CELL SYSTEMS, 2019, 9 (06) : 600 - +
  • [7] A Metabolome- and Metagenome-Wide Association Network Reveals Microbial Natural Products and Microbial Biotransformation Products from the Human Microbiota
    Cao, Liu
    Shcherbin, Egor
    Mohimani, Hosein
    [J]. MSYSTEMS, 2019, 4 (04)
  • [8] Doroghazi JR, 2014, NAT CHEM BIOL, V10, P963, DOI [10.1038/NCHEMBIO.1659, 10.1038/nchembio.1659]
  • [9] Searching molecular structure databases with tandem mass spectra using CSI:FingerID
    Duehrkop, Kai
    Shen, Huibin
    Meusel, Marvin
    Rousu, Juho
    Boecker, Sebastian
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (41) : 12580 - 12585
  • [10] Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry
    Elias, Joshua E.
    Gygi, Steven P.
    [J]. NATURE METHODS, 2007, 4 (03) : 207 - 214