Efficiently predicting high resolution mass spectra with graph neural networks

被引:0
作者
Murphy, Michael [1 ,2 ]
Jegelka, Stefanie [1 ]
Fraenkel, Ernest [2 ]
Kind, Tobias [3 ]
Healey, David [3 ]
Butler, Thomas [3 ]
机构
[1] MIT, Comp Sci & Artificial Intelligence Lab, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] MIT, Dept Biol Engn, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] Enveda Biosci, Boulder, CO 80301 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202 | 2023年 / 202卷
基金
加拿大自然科学与工程研究理事会;
关键词
SPECTROMETRY;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a large database of chemical structures. However, current approaches to spectrum prediction model the output space in ways that force a tradeoff between capturing high resolution mass information and tractable learning. We resolve this tradeoff by casting spectrum prediction as a mapping from an input molecular graph to a probability distribution over chemical formulas. We further discover that a large corpus of mass spectra can be closely approximated using a fixed vocabulary constituting only 2% of all observed formulas. This enables efficient spectrum prediction using an architecture similar to graph classification - GRAFF-MS - achieving significantly lower prediction error and greater retrieval accuracy than previous approaches.
引用
收藏
页数:14
相关论文
共 45 条
[11]   Predicting human health from biofluid-based metabolomics using machine learning [J].
Evans, Ethan D. ;
Duvallet, Claire ;
Chu, Nathaniel D. ;
Oberst, Michael K. ;
Murphy, Michael A. ;
Rockafellow, Isaac ;
Sontag, David ;
Alm, Eric J. .
SCIENTIFIC REPORTS, 2020, 10 (01)
[12]  
Fey Matthias, 2019, ICLR WORKSHOP REPRES
[13]  
Fiehn O., 2022, Critical Assessment of Small Molecule Identification
[14]   The ChEMBL database in 2017 [J].
Gaulton, Anna ;
Hersey, Anne ;
Nowotka, Michal ;
Bento, A. Patricia ;
Chambers, Jon ;
Mendez, David ;
Mutowo, Prudence ;
Atkinson, Francis ;
Bellis, Louisa J. ;
Cibrian-Uhalte, Elena ;
Davies, Mark ;
Dedman, Nathan ;
Karlsson, Anneli ;
Magarinos, Maria Paula ;
Overington, John P. ;
Papadatos, George ;
Smit, Ines ;
Leach, Andrew R. .
NUCLEIC ACIDS RESEARCH, 2017, 45 (D1) :D945-D954
[15]   Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning [J].
Gessulat, Siegfried ;
Schmidt, Tobias ;
Zolg, Daniel Paul ;
Samaras, Patroklos ;
Schnatbaum, Karsten ;
Zerweck, Johannes ;
Knaute, Tobias ;
Rechenberger, Julia ;
Delanghe, Bernard ;
Huhmer, Andreas ;
Reimer, Ulf ;
Ehrlich, Hans-Christian ;
Aiche, Stephan ;
Kuster, Bernhard ;
Wilhelm, Mathias .
NATURE METHODS, 2019, 16 (06) :509-+
[16]  
Gilmer J, 2017, PR MACH LEARN RES, V70
[17]  
Goldman S., 2023, GENERATING MOL FRAGM
[18]   METLIN: A Technology Platform for Identifying Knowns and Unknowns [J].
Guijas, Carlos ;
Montenegro-Burke, J. Rafael ;
Domingo-Almenara, Xavier ;
Palermo, Amelia ;
Warth, Benedikt ;
Hermann, Gerrit ;
Koellensperger, Gunda ;
Huan, Tao ;
Uritboonthai, Winnie ;
Aisporna, Aries E. ;
Wolan, Dennis W. ;
Spilker, Mary E. ;
Benton, H. Paul ;
Siuzdak, Gary .
ANALYTICAL CHEMISTRY, 2018, 90 (05) :3156-3164
[19]   InChI, the IUPAC International Chemical Identifier [J].
Heller, Stephen R. ;
McNaught, Alan ;
Pletnev, Igor ;
Stein, Stephen ;
Tchekhovskoi, Dmitrii .
JOURNAL OF CHEMINFORMATICS, 2015, 7
[20]   Current use of high-resolution mass spectrometry in the environmental sciences [J].
Hernandez, F. ;
Sancho, J. V. ;
Ibanez, M. ;
Abad, E. ;
Portoles, T. ;
Mattioli, L. .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2012, 403 (05) :1251-1264