Robust prediction of the MASCOT score for an improved quality assessment in mass spectrometric proteomics

被引:150
作者
Koenig, Thomas [1 ,2 ]
Menze, Bjoern H. [2 ]
Kirchner, Marc [1 ,2 ]
Monigatti, Flavio [1 ,3 ]
Parker, Kenneth C. [4 ]
Patterson, Thomas [1 ]
Steen, Judith Jebanathirajah [5 ]
Hamprecht, Fred A. [2 ]
Steen, Hanno [1 ,3 ]
机构
[1] Childrens Hosp, Dept Pathol, Boston, MA 02115 USA
[2] Heidelberg Univ, Interdisciplinary Ctr Sci Comp, D-69120 Heidelberg, Germany
[3] Harvard Univ, Sch Med, Dept Pathol, Boston, MA 02115 USA
[4] Harvard Univ, Sch Med, Partners Healthcare Ctr Genom & Genet, Cambridge, MA 02139 USA
[5] Harvard Univ, Sch Med, Dept Neurobiol, Boston, MA 02115 USA
关键词
classification; supervised learning; regression; random forest; peptide identification;
D O I
10.1021/pr700859x
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Protein identification by tandem mass spectrometry is based on the reliable processing of the acquired data. Unfortunately, the generation of a large number of poor quality spectra is commonly observed in LC-MS/MS, and the processing of these mostly noninformative spectra with its associated costs should be avoided. We present a continuous quality score that can be computed very quickly and that can be considered an approximation of the MASCOT score in case of a correct identification. This score can be used to reject low quality spectra prior to database identification, or to draw attention to those spectra that exhibit a (supposedly) high information content, but could not be identified. The proposed quality score can be calibrated automatically on site without the need for a manually generated training set. When this score is turned into a classifier and when features are used that are independent of the instrument, the proposed approach performs equally to previously published classifiers and feature sets and also gives insights into the behavior of the MASCOT score.
引用
收藏
页码:3708 / 3717
页数:10
相关论文
共 22 条
[1]   A new algorithm for the evaluation of shotgun peptide sequencing in proteomics: Support vector machine classification of peptide MS/MS spectra and SEQUEST scores [J].
Anderson, DC ;
Li, WQ ;
Payan, DG ;
Noble, WS .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (02) :137-146
[2]   Automatic Quality Assessment of Peptide Tandem Mass Spectra [J].
Bern, Marshall ;
Goldberg, David ;
McDonald, W. Hayes ;
Yates, John R., III .
BIOINFORMATICS, 2004, 20 :49-54
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Choudhary JS, 2001, PROTEOMICS, V1, P651, DOI 10.1002/1615-9861(200104)1:5<651::AID-PROT651>3.0.CO
[5]  
2-N
[6]   Improving the reliability and throughput of mass spectrometry-based proteomics by spectrum quality filtering [J].
Flikka, K ;
Martens, L ;
Vandekerckhoe, J ;
Gevaert, K ;
Eidhammer, I .
PROTEOMICS, 2006, 6 (07) :2086-2094
[7]  
Hastie T., 2003, The Elements of Statistical Learning: Data Mining, Inference, and Prediction
[8]   Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search [J].
Keller, A ;
Nesvizhskii, AI ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2002, 74 (20) :5383-5392
[9]   Use of mass spectrometry-derived data to annotate nucleotide and protein sequence databases [J].
Mann, M ;
Pandey, A .
TRENDS IN BIOCHEMICAL SCIENCES, 2001, 26 (01) :54-61
[10]  
MENZE BH, MAGN RESON IN PRESS