A LIKELIHOOD-BASED SCORING METHOD FOR PEPTIDE IDENTIFICATION USING MASS SPECTROMETRY

被引:5
作者
Li, Qunhua [1 ]
Eng, Jimmy K. [2 ]
Stephens, Matthew [3 ]
机构
[1] Penn State Univ, Dept Stat, University Pk, PA 16802 USA
[2] Univ Washington, Dept Genome Sci, Seattle, WA 98195 USA
[3] Univ Chicago, Dept Stat & Human Genet, Chicago, IL 60637 USA
关键词
Generative model; maximum likelihood; peptide identification; proteomics; INDUCED DISSOCIATION SPECTRA; STATISTICAL-MODEL; PROTEIN IDENTIFICATIONS; VALIDATION; FRAGMENTATION; CONFIDENCE; PREDICTION; MOBILE; MS/MS;
D O I
10.1214/12-AOAS568
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Mass spectrometry provides a high-throughput approach to identify proteins in biological samples. A key step in the analysis of mass spectrometry data is to identify the peptide sequence that, most probably, gave rise to each observed spectrum. This is often tackled using a database search: each observed spectrum is compared against a large number of theoretical "expected" spectra predicted from candidate peptide sequences in a database, and the best match is identified using some heuristic scoring criterion. Here we provide a more principled, likelihood-based, scoring criterion for this problem. Specifically, we introduce a probabilistic model that allows one to assess, for each theoretical spectrum, the probability that it would produce the observed spectrum. This probabilistic model takes account of peak locations and intensities, in both observed and theoretical spectra, which enables incorporation of detailed knowledge of chemical plausibility in peptide identification. Besides placing peptide scoring on a sounder theoretical footing, the likelihood-based score also has important practical benefits: it provides natural measures for assessing the uncertainty of each identification, and in comparisons on benchmark data it produced more accurate peptide identifications than other methods, including SEQUEST. Although we focus here on peptide identification, our scoring rule could easily be integrated into any downstream analyses that require peptide-spectrum match scores.
引用
收藏
页码:1775 / 1794
页数:20
相关论文
共 28 条
[1]   Tandem mass spectrometry for peptide and protein sequence analysis [J].
Coon, JJ ;
Syka, JEP ;
Shabanowitz, J ;
Hunt, DF .
BIOTECHNIQUES, 2005, 38 (04) :519-+
[2]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[3]   Influence of peptide composition, gas-phase basicity, and chemical modification on fragmentation efficiency: Evidence for the mobile proton model [J].
Dongre, AR ;
Jones, JL ;
Somogyi, A ;
Wysocki, VH .
JOURNAL OF THE AMERICAN CHEMICAL SOCIETY, 1996, 118 (35) :8365-8374
[4]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214
[5]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[6]   A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes [J].
Fenyö, D ;
Beavis, RC .
ANALYTICAL CHEMISTRY, 2003, 75 (04) :768-774
[7]   Protein and gene model inference based on statistical modeling in k-partite graphs [J].
Gerster, Sarah ;
Qeli, Ermir ;
Ahrens, Christian H. ;
Buehlmann, Peter .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (27) :12101-12106
[8]   Automated protein identification by tandem mass spectrometry:: Issues and strategies [J].
Hernandez, P ;
Müller, M ;
Appel, RD .
MASS SPECTROMETRY REVIEWS, 2006, 25 (02) :235-254
[9]   Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search [J].
Keller, A ;
Nesvizhskii, AI ;
Kolker, E ;
Aebersold, R .
ANALYTICAL CHEMISTRY, 2002, 74 (20) :5383-5392
[10]  
Keller Andrew, 2002, OMICS A Journal of Integrative Biology, V6, P207, DOI 10.1089/153623102760092805