TIDD: tool-independent and data-dependent machine learning for peptide identification

被引:2
作者
Li, Honglan [1 ]
Na, Seungjin [2 ]
Hwang, Kyu-Baek [3 ]
Paek, Eunok [1 ]
机构
[1] Hanyang Univ, Dept Comp Sci, Seoul 04763, South Korea
[2] Hanyang Univ, Inst Artificial Intelligence Res, Seoul 04763, South Korea
[3] Soongsil Univ, Sch Comp Sci & Engn, Seoul 06978, South Korea
基金
新加坡国家研究基金会;
关键词
Mass spectrometry; Peptide identification; PSM rescoring; Tool-independent; Data-dependent; Machine learning; SHOTGUN PROTEOMICS; MASS-SPECTROMETRY; STATISTICAL-MODEL; ACCURATE; PERCOLATOR; STRATEGY; XTANDEM; MS/MS;
D O I
10.1186/s12859-022-04640-y
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In shotgun proteomics, database search engines have been developed to assign peptides to tandem mass (MS/MS) spectra and at the same time post-processing (or rescoring) approaches over the search results have been proposed to increase the number of confident peptide identifications. The most popular post-processing approaches such as Percolator and PeptideProphet have improved rates of peptide identifications by combining multiple scores from database search engines while applying machine learning techniques. Existing post-processing approaches, however, are limited when dealing with results from new search engines because their features for machine learning must be optimized specifically for each search engine. Results We propose a universal post-processing tool, called TIDD, which supports confident peptide identifications regardless of the search engine adopted. TIDD can work for any (including newly developed) search engines because it calculates universal features that assess peptide-spectrum match quality while it allows additional features provided by search engines (or users) as well. Even though it relies on universal features independent of search tools, TIDD showed similar or better performance than Percolator in terms of peptide identification. TIDD identified 10.23-38.95% more PSMs than target-decoy estimation for MSFragger, which is not supported by Percolator. TIDD offers an easy-to-use simple graphical user interface for user convenience. Conclusions TIDD successfully eliminated the requirement for an optimal feature engineering per database search tool, and thus, can be applied directly to any database search results including newly developed ones.
引用
收藏
页数:12
相关论文
共 27 条
[1]   An Optimized Shotgun Strategy for the Rapid Generation of Comprehensive Human Proteomes [J].
Bekker-Jensen, Dorte B. ;
Kelstrup, Christian D. ;
Batth, Tanveer S. ;
Larsen, Sara C. ;
Haldrup, Christa ;
Bramsen, Jesper B. ;
Sorensen, Karina D. ;
Hoyer, Soren ;
Orntoft, Torben F. ;
Andersen, Claus L. ;
Nielsen, Michael L. ;
Olsen, Jesper V. .
CELL SYSTEMS, 2017, 4 (06) :587-+
[2]   Accurate and Sensitive Peptide Identification with Mascot Percolator [J].
Brosch, Markus ;
Yu, Lu ;
Hubbard, Tim ;
Choudhary, Jyoti .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (06) :3176-3181
[3]   A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides [J].
Chick, Joel M. ;
Kolippakkam, Deepak ;
Nusinow, David P. ;
Zhai, Bo ;
Rad, Ramin ;
Huttlin, Edward L. ;
Gygi, Steven P. .
NATURE BIOTECHNOLOGY, 2015, 33 (07) :743-749
[4]   Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :254-265
[5]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[6]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[7]   Comet: An open-source MS/MS sequence database search tool [J].
Eng, Jimmy K. ;
Jahan, Tahmina A. ;
Hoopmann, Michael R. .
PROTEOMICS, 2013, 13 (01) :22-24
[8]   Fast and Accurate Database Searches with MS-GF plus Percolator [J].
Granholm, Viktor ;
Kim, Sangtae ;
Navarro, Jose C. F. ;
Sjolund, Erik ;
Smith, Richard D. ;
Kall, Lukas .
JOURNAL OF PROTEOME RESEARCH, 2014, 13 (02) :890-897
[9]   Speeding Up Percolator [J].
Halloran, John T. ;
Zhang, Hantian ;
Kara, Kaan ;
Renggli, Cedric ;
The, Matthew ;
Zhang, Ce ;
Rocke, David M. ;
Kall, Lukas ;
Noble, William Stafford .
JOURNAL OF PROTEOME RESEARCH, 2019, 18 (09) :3353-3359
[10]   Scavager: A Versatile Postsearch Validation Algorithm for Shotgun Proteomics Based on Gradient Boosting [J].
Ivanov, Mark V. ;
Levitsky, Lev I. ;
Bubis, Julia A. ;
Gorshkov, Mikhail V. .
PROTEOMICS, 2019, 19 (03)