ProPept-MT: A Multi-Task Learning Model for Peptide Feature Prediction

被引:1
作者
He, Guoqiang [1 ,2 ]
He, Qingzu [3 ,4 ]
Cheng, Jinyan [2 ]
Yu, Rongwen [2 ]
Shuai, Jianwei [2 ]
Cao, Yi [1 ,2 ]
机构
[1] Wenzhou Med Univ, Postgrad Training Base Alliance, Wenzhou 325000, Peoples R China
[2] Univ Chinese Acad Sci, Wenzhou Inst, Wenzhou 325000, Peoples R China
[3] Xiamen Univ, Dept Phys, Xiamen 361005, Peoples R China
[4] Xiamen Univ, Fujian Prov Key Lab Soft Funct Mat Res, Xiamen 361005, Peoples R China
基金
中国国家自然科学基金;
关键词
proteomics; retention time; ion intensity; ion mobility; multi-task learning; deep learning; DATA SUBMISSION; PROTEOMICS; PERFORMANCE; SPECTRA; PASEF;
D O I
10.3390/ijms25137237
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
In the realm of quantitative proteomics, data-independent acquisition (DIA) has emerged as a promising approach, offering enhanced reproducibility and quantitative accuracy compared to traditional data-dependent acquisition (DDA) methods. However, the analysis of DIA data is currently hindered by its reliance on project-specific spectral libraries derived from DDA analyses, which not only limits proteome coverage but also proves to be a time-intensive process. To overcome these challenges, we propose ProPept-MT, a novel deep learning-based multi-task prediction model designed to accurately forecast key features such as retention time (RT), ion intensity, and ion mobility (IM). Leveraging advanced techniques such as multi-head attention and BiLSTM for feature extraction, coupled with Nash-MTL for gradient coordination, ProPept-MT demonstrates superior prediction performance. Integrating ion mobility alongside RT, mass-to-charge ratio (m/z), and ion intensity forms 4D proteomics. Then, we outline a comprehensive workflow tailored for 4D DIA proteomics research, integrating the use of 4D in silico libraries predicted by ProPept-MT. Evaluation on a benchmark dataset showcases ProPept-MT's exceptional predictive capabilities, with impressive results including a 99.9% Pearson correlation coefficient (PCC) for RT prediction, a median dot product (DP) of 96.0% for fragment ion intensity prediction, and a 99.3% PCC for IM prediction on the test set. Notably, ProPept-MT manifests efficacy in predicting both unmodified and phosphorylated peptides, underscoring its potential as a valuable tool for constructing high-quality 4D DIA in silico libraries.
引用
收藏
页数:20
相关论文
共 53 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   Rapid and site-specific deep phosphoproteome profiling by data-independent acquisition without the need for spectral libraries [J].
Bekker-Jensen, Dorte B. ;
Bernhardt, Oliver M. ;
Hogrebe, Alexander ;
Martinez-Val, Ana ;
Verbeke, Lynn ;
Gandhi, Tejas ;
Kelstrup, Christian D. ;
Reiter, Lukas ;
Olsen, Jesper V. .
NATURE COMMUNICATIONS, 2020, 11 (01)
[3]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[4]   High-Coverage Four-Dimensional Data-Independent Acquisition Proteomics and Phosphoproteomics Enabled by Deep Learning-Driven Multidimensional Predictions [J].
Chen, Moran ;
Zhu, Pujia ;
Wan, Qiongqiong ;
Ruan, Xianqin ;
Wu, Pengfei ;
Hao, Yanhong ;
Zhang, Zhourui ;
Sun, Jian ;
Nie, Wenjing ;
Chen, Suming .
ANALYTICAL CHEMISTRY, 2023, 95 (19) :7495-7502
[5]   The PRoteomics IDEntification (PRIDE) Converter 2 Framework: An Improved Suite of Tools to Facilitate Data Submission to the PRIDE Database and the ProteomeXchange Consortium [J].
Cote, Richard G. ;
Griss, Johannes ;
Dianes, Jose A. ;
Wang, Rui ;
Wright, James C. ;
van den Toorn, Henk W. P. ;
van Breukelen, Bas ;
Heck, Albert J. R. ;
Hulstaert, Niels ;
Martens, Lennart ;
Reisinger, Florian ;
Csordas, Attila ;
Ovelleiro, David ;
Perez-Rivevol, Yasset ;
Barsnes, Harald ;
Hermjakob, Henning ;
Vizcaino, Juan Antonio .
MOLECULAR & CELLULAR PROTEOMICS, 2012, 11 (12) :1682-1689
[6]   MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification [J].
Cox, Juergen ;
Mann, Matthias .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1367-1372
[7]   dia-PASEF data analysis using FragPipe and DIA-NN for deep proteomics of low sample amounts [J].
Demichev, Vadim ;
Szyrwiel, Lukasz ;
Yu, Fengchao ;
Teo, Guo Ci ;
Rosenberger, George ;
Niewienda, Agathe ;
Ludwig, Daniela ;
Decker, Jens ;
Kaspar-Schoenefeld, Stephanie ;
Lilley, Kathryn S. ;
Muelleder, Michael ;
Nesvizhskii, Alexey, I ;
Ralser, Markus .
NATURE COMMUNICATIONS, 2022, 13 (01)
[8]   DIA-NN: neural networks and interference correction enable deep proteome coverage in high throughput [J].
Demichev, Vadim ;
Messner, Christoph B. ;
Vernardis, Spyros I. ;
Lilley, Kathryn S. ;
Ralser, Markus .
NATURE METHODS, 2020, 17 (01) :41-+
[9]  
Distler U, 2014, NAT METHODS, V11, P167, DOI [10.1038/NMETH.2767, 10.1038/nmeth.2767]
[10]   Using iRT, a normalized retention time for more targeted measurement of peptides [J].
Escher, Claudia ;
Reiter, Lukas ;
MacLean, Brendan ;
Ossola, Reto ;
Herzog, Franz ;
Chilton, John ;
MacCoss, Michael J. ;
Rinner, Oliver .
PROTEOMICS, 2012, 12 (08) :1111-1121