Predicting Peptide Ionization Efficiencies for Electrospray Ionization Mass Spectrometry Using Machine Learning

被引:1
作者
Kaskow, Justin A. [1 ]
Hahnert, Eric T. [1 ]
Porter, Thomas K. [1 ]
Lu, Yali [2 ]
Stanev, Valentin
Niu, Chendi [2 ]
Xu, Wei [2 ]
Albarghouthi, Methal [2 ]
Wang, Chunlei [2 ]
机构
[1] MIT, David H Koch Sch Chem Engn Practice, Cambridge, MA 02139 USA
[2] AstraZeneca, Analyt Sci, BioPharmaceut R&D, Gaithersburg, MD 20878 USA
关键词
ABSOLUTE QUANTIFICATION; PROTEOMICS; PROTEIN;
D O I
10.1021/jasms.4c00137
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mass spectrometry (MS) is inherently an information-rich technique. In this era of big data, label-free MS quantification for nontargeted studies has gained increasing popularity, especially for complex systems. One of the cornerstones of successful label-free quantification is the predictive modeling of ionization efficiency (IE) based on solutes' physicochemical properties. While many have studied IE modeling for small molecules, there are limited reports on peptide IEs. In this study, we leverage the stoichiometric relationship in trypsin digests of well-characterized monoclonal antibodies (mAbs) to compile a data set of relative ionization efficiencies (RIEs) for 241 peptides. From each peptide's sequence, we computed a set of physiochemical descriptors, which were then used to train machine learning regression models to predict RIEs. Peptides shorter than 20 amino acids had RIEs that were highly correlated to their molecular weight. A random forest (RF) model was able to best predict the RIEs of a test data set with a mean relative error of 23.9%. For larger peptides, a multilayer perceptron (MLP) model improved RIE prediction compared to current best practices, reducing mean relative error from 60.5% to 32.0%. Finally, we also show the application of the RF model in label-free relative protein quantification and improving the quantification of peptide post-translational modifications (PTMs). This approach to predicting peptide IEs from their sequences enables the development of accurate label-free quantification workflows for peptide and protein analysis.
引用
收藏
页码:2297 / 2307
页数:11
相关论文
共 26 条
[1]   Quantitative mass spectrometry in proteomics: a critical review [J].
Bantscheff, Marcus ;
Schirle, Markus ;
Sweetman, Gavain ;
Rick, Jens ;
Kuster, Bernhard .
ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2007, 389 (04) :1017-1031
[2]  
Bantscheff M, 2012, ANAL BIOANAL CHEM, V404, P939, DOI 10.1007/s00216-012-6203-4
[3]   NEURAL NETWORKS AND THEIR APPLICATIONS [J].
BISHOP, CM .
REVIEW OF SCIENTIFIC INSTRUMENTS, 1994, 65 (06) :1803-1832
[4]   Antibacterial peptides: basic facts and emerging concepts [J].
Boman, HG .
JOURNAL OF INTERNAL MEDICINE, 2003, 254 (03) :197-215
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   Interpretable Numerical Descriptors of Amino Acid Space [J].
Georgiev, Alexander G. .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2009, 16 (05) :703-723
[7]   Structure-response relationship in electrospray ionization-mass spectrometry of sartans by artificial neural networks [J].
Golubovic, Jelena ;
Birkemeyer, Claudia ;
Protic, Ana ;
Otasevic, Biljana ;
Zecevic, Mira .
JOURNAL OF CHROMATOGRAPHY A, 2016, 1438 :123-132
[8]   Implementation and evaluation of relative and absolute quantification in shotgun proteomics with label-free methods [J].
Grossmann, Jonas ;
Roschitzki, Bernd ;
Panse, Christian ;
Fortes, Claudia ;
Barkow-Oesterreicher, Simon ;
Rutishauser, Dorothea ;
Schlapbach, Ralph .
JOURNAL OF PROTEOMICS, 2010, 73 (09) :1740-1746
[9]   CORRELATION BETWEEN STABILITY OF A PROTEIN AND ITS DIPEPTIDE COMPOSITION - A NOVEL-APPROACH FOR PREDICTING INVIVO STABILITY OF A PROTEIN FROM ITS PRIMARY SEQUENCE [J].
GURUPRASAD, K ;
REDDY, BVB ;
PANDIT, MW .
PROTEIN ENGINEERING, 1990, 4 (02) :155-161
[10]   Identification and characterization of post-translational modifications: Clinical implications [J].
Hermann, Juliane ;
Schurgers, Leon ;
Jankowski, Vera .
MOLECULAR ASPECTS OF MEDICINE, 2022, 86