Novor: Real-Time Peptide de Novo Sequencing Software

被引:149
作者
Ma, Bin [1 ]
机构
[1] Univ Waterloo, Sch Comp Sci, Waterloo, ON N2L 3G1, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Peptide de novo sequencing; Tandem mass spectrometry; Software; Real time; Decision tree; TANDEM MASS-SPECTROMETRY; IDENTIFICATION; DATABASE; MS/MS; SPECTRA; ALGORITHM; ACCURACY; SEARCH; MODEL;
D O I
10.1007/s13361-015-1204-0
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
De novo sequencing software has been widely used in proteomics to sequence new peptides from tandem mass spectrometry data. This study presents a new software tool, Novor, to greatly improve both the speed and accuracy of today's peptide de novo sequencing analyses. To improve the accuracy, Novor's scoring functions are based on two large decision trees built from a peptide spectral library with more than 300,000 spectra with machine learning. Important knowledge about peptide fragmentation is extracted automatically from the library and incorporated into the scoring functions. The decision tree model also enables efficient score calculation and contributes to the speed improvement. To further improve the speed, a two-stage algorithmic approach, namely dynamic programming and refinement, is used. The software program was also carefully optimized. On the testing datasets, Novor sequenced 7%-37% more correct residues than the state-of-the-art de novo sequencing tool, PEAKS, while being an order of magnitude faster. Novor can de novo sequence more than 300 MS/MS spectra per second on a laptop computer. The speed surpasses the acquisition speed of today's mass spectrometer and, therefore, opens a new possibility to de novo sequence in real time while the spectrometer is acquiring the spectral data.
引用
收藏
页码:1885 / 1894
页数:10
相关论文
共 54 条
  • [1] Through the eye of an electrospray needle: mass spectrometric identification of the major peptides and proteins in the milk of the one-humped camel (Camelus dromedarius)
    Alhaider, Abdulqader
    Abdelgader, Abdel Galil
    Turjoman, Abdullah Arif
    Newell, Keri
    Hunsucker, Stephen W.
    Shan, Baozhen
    Ma, Bin
    Gibson, David S.
    Duncan, Mark W.
    [J]. JOURNAL OF MASS SPECTROMETRY, 2013, 48 (07): : 779 - 794
  • [2] Instant spectral assignment for advanced decision tree-driven mass spectrometry
    Bailey, Derek J.
    Rose, Christopher M.
    McAlister, Graeme C.
    Brumbaugh, Justin
    Yu, Pengzhi
    Wenger, Craig D.
    Westphall, Michael S.
    Thomson, James A.
    Coon, Joshua J.
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2012, 109 (22) : 8411 - 8416
  • [3] Cleavage N-terminal to proline: Analysis of a database of peptide tandem mass spectra
    Breci, LA
    Tabb, DL
    Yates, JR
    Wysocki, VH
    [J]. ANALYTICAL CHEMISTRY, 2003, 75 (09) : 1963 - 1971
  • [4] A comparative study of the accuracy of several de novo sequencing software packages for datasets derived by matrix-assisted laser desorption/ionisation and electrospray
    Bringans, Scott
    Kendrick, Tulene S.
    Lui, James
    Lipscombe, Richard
    [J]. RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2008, 22 (21) : 3450 - 3454
  • [5] Proteome-wide characterization of sugarbeet seed vigor and its tissue specific expression
    Catusse, Julie
    Strub, Jean-Marc
    Job, Claudette
    Van Dorsselaer, Alain
    Job, Dominique
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2008, 105 (29) : 10262 - 10267
  • [6] pNovo: De novo Peptide Sequencing and Identification Using HCD Spectra
    Chi, Hao
    Sun, Rui-Xiang
    Yang, Bing
    Song, Chun-Qing
    Wang, Le-Heng
    Liu, Chao
    Fu, Yan
    Yuan, Zuo-Fei
    Wang, Hai-Peng
    He, Si-Min
    Dong, Meng-Qiu
    [J]. JOURNAL OF PROTEOME RESEARCH, 2010, 9 (05) : 2713 - 2724
  • [7] BioID-based Identification of Skp Cullin F-box (SCF)β-TrCP1/2 E3 Ligase Substrates
    Coyaud, Etienne
    Mis, Monika
    Laurent, Estelle M. N.
    Dunham, Wade H.
    Couzens, Amber L.
    Robitaille, Melanie
    Gingras, Anne-Claude
    Angers, Stephane
    Raught, Brian
    [J]. MOLECULAR & CELLULAR PROTEOMICS, 2015, 14 (07) : 1781 - 1795
  • [8] Using annotated peptide mass spectrum libraries for protein identification
    Craig, R.
    Cortens, J. C.
    Fenyo, D.
    Beavis, R. C.
    [J]. JOURNAL OF PROTEOME RESEARCH, 2006, 5 (08) : 1843 - 1849
  • [9] De novo peptide sequencing via tandem mass spectrometry
    Dancík, V
    Addona, TA
    Clauser, KR
    Vath, JE
    Pevzner, PA
    [J]. JOURNAL OF COMPUTATIONAL BIOLOGY, 1999, 6 (3-4) : 327 - 342
  • [10] Sequencing and Quantifying IgG Fragments and Antigen-Binding Regions by Mass Spectrometry
    de Costa, Dominique
    Broodman, Ingrid
    VanDuijn, Martijn M.
    Stingl, Christoph
    Dekker, Lennard J. M.
    Burgers, Peter C.
    Hoogsteden, Henk C.
    Smitt, Peter A. E. Sillevis
    van Klaveren, Rob J.
    Luider, Theo M.
    [J]. JOURNAL OF PROTEOME RESEARCH, 2010, 9 (06) : 2937 - 2945