NovoRank: Refinement for De Novo Peptide Sequencing Based on Spectral Clustering and Deep Learning

被引:0
作者
Seo, Jangho [1 ]
Choi, Seunghyuk [2 ]
Paek, Eunok [1 ,2 ]
机构
[1] Hanyang Univ, Dept Artificial Intelligence, Seoul 04763, South Korea
[2] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
基金
新加坡国家研究基金会;
关键词
bioinformatics; proteomics; peptide identification; de novo peptide sequencing; spectralclustering; deep learning;
D O I
10.1021/acs.jproteome.4c00300
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
De novo peptide sequencing is a valuable technique in mass-spectrometry-based proteomics, as it deduces peptide sequences directly from tandem mass spectra without relying on sequence databases. This database-independent method, however, relies solely on imperfect scoring functions that often lead to erroneous peptide identifications. To boost correct identification, we present NovoRank, a postprocessing tool that employs spectral clustering and machine learning to assign more plausible peptide sequences to spectra. Prior to de novo peptide sequencing, spectral clustering is applied to group similar spectra under the assumption that they originated from the same peptide species. NovoRank then employs a deep learning model, incorporating both cluster-derived proteomic features and individual spectrum characteristics, to rerank the candidate peptides produced by de novo peptide sequencing. Our results show that NovoRank significantly enhances the performance of various de novo peptide sequencing tools, increasing both recall and precision by 0.020 to 0.080 at the peptide-spectrum match (PSM) level. Notably, NovoRank achieves a recall as high as 0.830 for Casanovo at the PSM level. The source code of NovoRank is freely available at https://github.com/HanyangBISLab/NovoRank and is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.
引用
收藏
页码:903 / 910
页数:8
相关论文
共 22 条
[1]   Proteogenomics of Colorectal Cancer Liver Metastases: Complementing Precision Oncology with Phenotypic Data [J].
Blank-Landeshammer, Bernhard ;
Richard, Vincent R. ;
Mitsa, Georgia ;
Marques, Maud ;
LeBlanc, Andre ;
Kollipara, Laxmikanth ;
Feldmann, Ingo ;
du Tertre, Mathilde Couetoux ;
Gambaro, Karen ;
McNamara, Suzan ;
Spatz, Alan ;
Zahedi, Rene P. ;
Sickmann, Albert ;
Batist, Gerald ;
Borchers, Christoph H. .
CANCERS, 2019, 11 (12)
[2]   DeepLC can predict retention times for peptides that carry as-yet unseen modifications [J].
Bouwmeester, Robbin ;
Gabriels, Ralf ;
Hulstaert, Niels ;
Martens, Lennart ;
Degroeve, Sven .
NATURE METHODS, 2021, 18 (11) :1363-+
[3]  
Bromley J., 1993, International Journal of Pattern Recognition and Artificial Intelligence, V7, P669, DOI 10.1142/S0218001493000339
[4]  
Burges Chris, 2005, PROC 22 INT C MACH L, P89
[5]   pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra [J].
Chi, Hao ;
Sun, Rui-Xiang ;
Yang, Bing ;
Song, Chun-Qing ;
Wang, Le-Heng ;
Liu, Chao ;
Fu, Yan ;
Yuan, Zuo-Fei ;
Wang, Hai-Peng ;
He, Si-Min ;
Dong, Meng-Qiu .
JOURNAL OF PROTEOME RESEARCH, 2010, 9 (05) :2713-2724
[6]   A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides [J].
Chick, Joel M. ;
Kolippakkam, Deepak ;
Nusinow, David P. ;
Zhai, Bo ;
Rad, Ramin ;
Huttlin, Edward L. ;
Gygi, Steven P. .
NATURE BIOTECHNOLOGY, 2015, 33 (07) :743-749
[7]   MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification [J].
Cox, Juergen ;
Mann, Matthias .
NATURE BIOTECHNOLOGY, 2008, 26 (12) :1367-1372
[8]   The ProteomeXchange consortium at 10 years: 2023 update [J].
Deutsch, Eric W. ;
Bandeira, Nuno ;
Perez-Riverol, Yasset ;
Sharma, Vagisha ;
Carver, Jeremy J. ;
Mendoza, Luis ;
Kundu, Deepti J. ;
Wang, Shengbo ;
Bandla, Chakradhar ;
Kamatchinathan, Selvakumar ;
Hewapathirana, Suresh ;
Pullman, Benjamin S. ;
Wertz, Julie ;
Sun, Zhi ;
Kawano, Shin ;
Okuda, Shujiro ;
Watanabe, Yu ;
MacLean, Brendan ;
MacCoss, Michael J. ;
Zhu, Yunping ;
Ishihama, Yasushi ;
Vizcaino, Juan Antonio .
NUCLEIC ACIDS RESEARCH, 2023, 51 (D1) :D1539-D1548
[9]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[10]   Comet: An open-source MS/MS sequence database search tool [J].
Eng, Jimmy K. ;
Jahan, Tahmina A. ;
Hoopmann, Michael R. .
PROTEOMICS, 2013, 13 (01) :22-24