Evaluating de novo sequencing in proteomics: already an accurate alternative to database-driven peptide identification?

被引:75
作者
Muth, Thilo [1 ]
Renard, Bernhard Y. [1 ]
机构
[1] Robert Koch Inst, Bioinformat, Berlin, Germany
关键词
de novo peptide sequencing; benchmarking study; bioinformatics; tandem mass spectrometry; HCD; CID; peptide identification; sequence tags; TANDEM MASS-SPECTROMETRY; PROTEIN IDENTIFICATION; SHOTGUN PROTEOMICS; COMPUTER-PROGRAM; TOP-DOWN; TOOL; MS/MS; PERFORMANCE; ALGORITHMS; SOFTWARE;
D O I
10.1093/bib/bbx033
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
While peptide identifications in mass spectrometry (MS)-based shotgun proteomics are mostly obtained using database search methods, high-resolution spectrum data from modern MS instruments nowadays offer the prospect of improving the performance of computational de novo peptide sequencing. The major benefit of de novo sequencing is that it does not require a reference database to deduce full-length or partial tag-based peptide sequences directly from experimental tandem mass spectrometry spectra. Although various algorithms have been developed for automated de novo sequencing, the prediction accuracy of proposed solutions has been rarely evaluated in independent benchmarking studies. The main objective of this work is to provide a detailed evaluation on the performance of de novo sequencing algorithms on high-resolution data. For this purpose, we processed four experimental data sets acquired from different instrument types from collision-induced dissociation and higher energy collisional dissociation (HCD) fragmentation mode using the software packages Novor, PEAKS and PepNovo. Moreover, the accuracy of these algorithms is also tested on ground truth data based on simulated spectra generated from peak intensity prediction software. We found that Novor shows the overall best performance compared with PEAKS and PepNovo with respect to the accuracy of correct full peptide, tag-based and single-residue predictions. In addition, the same tool outpaced the commercial competitor PEAKS in terms of running time speedup by factors of around 12-17. Despite around 35% prediction accuracy for complete peptide sequences on HCD data sets, taken as a whole, the evaluated algorithms perform moderately on experimental data but show a significantly better performance on simulated data (up to 84% accuracy). Further, we describe the most frequently occurring de novo sequencing errors and evaluate the influence of missing fragment ion peaks and spectral noise on the accuracy. Finally, we discuss the potential of de novo sequencing for now becoming more widely used in the field.
引用
收藏
页码:954 / 970
页数:17
相关论文
共 102 条
[21]   Comparison of the Cowpox Virus and Vaccinia Virus Mature Virion Proteome: Analysis of the Species- and Strain-Specific Proteome [J].
Doellinger, Joerg ;
Schaade, Lars ;
Nitsche, Andreas .
PLOS ONE, 2015, 10 (11)
[22]   Intensity-based protein identification by machine learning from a library of tandem mass spectra [J].
Elias, JE ;
Gibbons, FD ;
King, OD ;
Roth, FP ;
Gygi, SP .
NATURE BIOTECHNOLOGY, 2004, 22 (02) :214-219
[23]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[24]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[25]  
Fernandez-de-Cossio J, 2000, ELECTROPHORESIS, V21, P1694, DOI 10.1002/(SICI)1522-2683(20000501)21:9<1694::AID-ELPS1694>3.3.CO
[26]  
2-N
[27]   NovoHMM: A hidden Markov model for de novo peptide sequencing [J].
Fischer, B ;
Roth, V ;
Roos, F ;
Grossmann, J ;
Baginsky, S ;
Widmayer, P ;
Gruissem, W ;
Buhmann, JM .
ANALYTICAL CHEMISTRY, 2005, 77 (22) :7265-7273
[28]   PepNovo: De novo peptide sequencing via probabilistic network modeling [J].
Frank, A ;
Pevzner, P .
ANALYTICAL CHEMISTRY, 2005, 77 (04) :964-973
[29]   Testing and Validation of Computational Methods for Mass Spectrometry [J].
Gatto, Laurent ;
Hansen, Kasper D. ;
Hoopmann, Michael R. ;
Hermjakob, Henning ;
Kohlbacher, Oliver ;
Beyer, Andreas .
JOURNAL OF PROTEOME RESEARCH, 2016, 15 (03) :809-814
[30]   Peptide de novo sequencing of mixture tandem mass spectra [J].
Gorshkov, Vladimir ;
Hotta, Stephanie Yuki Kolbeck ;
Verano-Braga, Thiago ;
Kjeldsen, Frank .
PROTEOMICS, 2016, 16 (18) :2470-2479