Computational refinement of post-translational modifications predicted from tandem mass spectrometry

被引:12
作者
Chung, Clement [1 ,2 ]
Liu, Jian [3 ,4 ]
Emili, Andrew [3 ,4 ]
Frey, Brendan J. [1 ,2 ,3 ,5 ]
机构
[1] Univ Toronto, Dept Comp Sci, Toronto, ON, Canada
[2] Univ Toronto, Probabilist & Stat Inference Grp, Toronto, ON, Canada
[3] Univ Toronto, Banting & Best Dept Med Res, Toronto, ON, Canada
[4] Univ Toronto, Donnelly Ctr Cellular & Biomol Res, Toronto, ON, Canada
[5] Univ Toronto, Dept Elect & Comp Engn, Toronto, ON, Canada
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
PEPTIDE MODIFICATIONS; IDENTIFICATION; SEQUENCES; SOFTWARE; STRATEGY; ACCURATE; SEARCH; MS/MS;
D O I
10.1093/bioinformatics/btr017
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: A post-translational modification (PTM) is a chemical modification of a protein that occurs naturally. Many of these modifications, such as phosphorylation, are known to play pivotal roles in the regulation of protein function. Henceforth, PTM perturbations have been linked to diverse diseases like Parkinson's, Alzheimer's, diabetes and cancer. To discover PTMs on a genome-wide scale, there is a recent surge of interest in analyzing tandem mass spectrometry data, and several unrestrictive (so-called 'blind') PTM search methods have been reported. However, these approaches are subject to noise in mass measurements and in the predicted modification site (amino acid position) within peptides, which can result in false PTM assignments. Results: To address these issues, we devised a machine learning algorithm, PTMClust, that can be applied to the output of blind PTM search methods to improve prediction quality, by suppressing noise in the data and clustering peptides with the same underlying modification to form PTM groups. We show that our technique outperforms two standard clustering algorithms on a simulated dataset. Additionally, we show that our algorithm significantly improves sensitivity and specificity when applied to the output of three different blind PTM search engines, SIMS, InsPecT and MODmap. Additionally, PTMClust markedly outperforms another PTM refinement algorithm, PTMFinder. We demonstrate that our technique is able to reduce false PTM assignments, improve overall detection coverage and facilitate novel PTM discovery, including terminus modifications. We applied our technique to a large-scale yeast MS/MS proteome profiling dataset and found numerous known and novel PTMs. Accurately identifying modifications in protein sequences is a critical first step for PTM profiling, and thus our approach may benefit routine proteomic analysis.
引用
收藏
页码:797 / 806
页数:10
相关论文
共 37 条
[1]  
[Anonymous], 1997, EM ALGORITHM ITS EXT
[2]   A Novel Approach for Untargeted Post-translational Modification Identification Using Integer Linear Optimization and Tandem Mass Spectrometry [J].
Baliban, Richard C. ;
DiMaggio, Peter A. ;
Plazas-Mayorca, Mariana D. ;
Young, Nicolas L. ;
Garcia, Benjamin A. ;
Floudas, Christodoulos A. .
MOLECULAR & CELLULAR PROTEOMICS, 2010, 9 (05) :764-779
[3]   SeMoP: A new computational strategy for the unrestricted search for modified peptides using LC-MS/MS data [J].
Baumgartner, Christian ;
Rejtar, Tomas ;
Kullolli, Majlinda ;
Akella, Lakshmi Manohar ;
Karger, Barry L. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (09) :4199-4208
[4]   Large-scale characterization of HeLa cell nuclear phosphoproteins [J].
Beausoleil, SA ;
Jedrychowski, M ;
Schwartz, D ;
Elias, JE ;
Villén, J ;
Li, JX ;
Cohn, MA ;
Cantley, LC ;
Gygi, SP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (33) :12130-12135
[5]   PTMap-A sequence alignment software for unrestricted, accurate, and full-spectrum identification of post-translational modification sites [J].
Chen, Yue ;
Chen, Wei ;
Cobb, Melanie H. ;
Zhao, Yingming .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (03) :761-766
[6]   A method for reducing the time required to match protein sequences with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
RAPID COMMUNICATIONS IN MASS SPECTROMETRY, 2003, 17 (20) :2310-2316
[7]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[8]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[9]   Review - Mass spectrometry and protein analysis [J].
Domon, B ;
Aebersold, R .
SCIENCE, 2006, 312 (5771) :212-217
[10]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989