Statistical Calibration of the SEQUEST XCorr Function

被引:50
作者
Klammer, Aaron A. [1 ]
Park, Christopher Y. [1 ]
Noble, William Stafford [1 ,2 ]
机构
[1] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[2] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
关键词
calibration; database search; peptide identification; tandem mass spectrometry; TANDEM MASS-SPECTRA; PEPTIDE IDENTIFICATION; PROTEIN IDENTIFICATION; DATABASE SEARCH; MODEL; PROBABILITY; ALGORITHM;
D O I
10.1021/pr8011107
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Obtaining accurate peptide identifications from shotgun proteomics liquid chromatography tandem mass spectrometry (LC-MS/MS) experiments requires a score function that consistently ranks correct peptide-spectrum matches (PSMs) above incorrect matches. We have observed that, for the Sequest score function Xcorr, the inability to discriminate between correct and incorrect PSMs is due in part to spectrum-specific properties of the score distribution. In other words, some spectra score well regardless of which peptides they are scored against, and other spectra score well because they are scored against a large number of peptides. We describe a protocol for calibrating PSM score functions, and we demonstrate its application to Xcorr and the preliminary Sequest score function Sp. The protocol accounts for spectrum- and peptide-specific effects by calculating p values for each spectrum individually, using only that spectrum's score distribution. We demonstrate that these calculated p values are uniform under a null distribution and therefore accurately measure significance. These p values can be used to estimate the false discovery rate, therefore, eliminating the need for an extra search against a decoy database. In addition, we show that the p values are better calibrated than their underlying scores; consequently, when ranking top-scoring PSMs from multiple spectra, p values are better at discriminating between correct and incorrect PSMs. The calibration protocol is generally applicable to any PSM score function for which an appopriate parametric family can be identified.
引用
收藏
页码:2106 / 2113
页数:8
相关论文
共 26 条
[1]   Calibrating e-values for MS2 database search methods [J].
Alves, Gelio ;
Ogurtsov, Aleksey Y. ;
Wu, Wells W. ;
Wang, Guanghui ;
Shen, Rong-Fong ;
Yu, Yi-Kuo .
BIOLOGY DIRECT, 2007, 2 (1)
[2]  
Bafna V, 2001, Bioinformatics, V17 Suppl 1, pS13
[3]   Estimating and evaluating the statistics of gapped local-alignment scores [J].
Bailey, TL ;
Gribskov, M .
JOURNAL OF COMPUTATIONAL BIOLOGY, 2002, 9 (03) :575-593
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Comparison of probability and likelihood models for peptide identification from tandem mass spectrometry data [J].
Cannon, WR ;
Jarman, KH ;
Webb-Robertson, BJM ;
Baxter, DJ ;
Oehmen, CS ;
Jarman, KD ;
Heredia-Langner, A ;
Auberry, KJ ;
Anderson, GA .
JOURNAL OF PROTEOME RESEARCH, 2005, 4 (05) :1687-1698
[6]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[7]  
Durbin R., 1998, Analysis, V356, DOI [10.1017/CBO9780511790492, DOI 10.1017/CBO9780511790492]
[8]  
Eddy SeanR., 1997, Maximum likelihood fitting of extreme value distributions
[9]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[10]   A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes [J].
Fenyö, D ;
Beavis, RC .
ANALYTICAL CHEMISTRY, 2003, 75 (04) :768-774