Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets

被引:127
作者
Griss, Johannes [1 ,2 ]
Perez-Riverol, Yasset [2 ]
Lewis, Steve [2 ]
Tabb, David L. [3 ]
Dianes, Jose A. [2 ]
del-Toro, Noemi [2 ]
Rurik, Marc [4 ,5 ]
Walzer, Mathias [4 ,5 ]
Kohlbacher, Oliver [4 ,5 ,6 ,7 ]
Hermjakob, Henning [2 ,8 ]
Wang, Rui [2 ]
Vizcaino, Juan Antonio [2 ]
机构
[1] Med Univ Vienna, Div Immunol Allergy & Infect Dis, Dept Dermatol, Vienna, Austria
[2] European Bioinformat Inst EMBL EBI, European Mol Biol Lab, Wellcome Trust Genome Campus, Cambridge, England
[3] Vanderbilt Univ, Sch Med, Dept Biomed Informat, Nashville, TN 37212 USA
[4] Univ Tubingen, Dept Comp Sci, Tubingen, Germany
[5] Univ Tubingen, Ctr Bioinformat, Tubingen, Germany
[6] Univ Tubingen, Quantitat Biol Ctr, Tubingen, Germany
[7] Max Planck Inst Dev Biol, Tubingen, Germany
[8] Natl Ctr Prot Sci, Beijing, Peoples R China
基金
英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
TANDEM MASS-SPECTRA; PROTEIN IDENTIFICATION; PEPTIDE IDENTIFICATION; PRIDE DATABASE; DATA SET; SPECTROMETRY; CELLS; PHOSPHOPROTEOME; PROTEOMEXCHANGE; CHROMATOGRAPHY;
D O I
10.1038/NMETH.3902
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average, 75% of spectra analyzed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large scale to shed light on these unidentified spectra. The Proteomics Identifications (PRIDE) Database Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in the PRIDE Archive, coming from hundreds of data sets, we were able to consistently characterize spectra into three distinct groups: (1) incorrectly identified, (2) correctly identified but below the set scoring threshold, and (3) truly unidentified. Using multiple complementary analysis approaches, we were able to identify similar to 20% of the consistently unidentified spectra. The complete spectrum-clustering results are available through the new version of the PRIDE Cluster resource (http://www.ebi.ac.uk/pride/cluster). This resource is intended, among other aims, to encourage and simplify further investigation into these unidentified spectra.
引用
收藏
页码:651 / +
页数:8
相关论文
共 40 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   Environmental Stress Affects the Activity of Metabolic and Growth Factor Signaling Networks and Induces Autophagy Markers in MCF7 Breast Cancer Cells [J].
Casado, Pedro ;
Bilanges, Benoit ;
Rajeeve, Vinothini ;
Vanhaesebroeck, Bart ;
Cutillas, Pedro R. .
MOLECULAR & CELLULAR PROTEOMICS, 2014, 13 (03) :836-848
[3]   Kinase-Substrate Enrichment Analysis Provides Insights into the Heterogeneity of Signaling Pathway Activation in Leukemia Cells [J].
Casado, Pedro ;
Rodriguez-Prados, Juan-Carlos ;
Cosulich, Sabina C. ;
Guichard, Sylvie ;
Vanhaesebroeck, Bart ;
Joel, Simon ;
Cutillas, Pedro R. .
SCIENCE SIGNALING, 2013, 6 (268) :rs6
[4]   A mass-tolerant database search identifies a large proportion of unassigned spectra in shotgun proteomics as modified peptides [J].
Chick, Joel M. ;
Kolippakkam, Deepak ;
Nusinow, David P. ;
Zhai, Bo ;
Rad, Ramin ;
Huttlin, Edward L. ;
Gygi, Steven P. .
NATURE BIOTECHNOLOGY, 2015, 33 (07) :743-749
[5]   Confident and sensitive phosphoproteomics using combinations of collision induced dissociation and electron transfer dissociation [J].
Collins, Mark O. ;
Wright, James C. ;
Jones, Matthew ;
Rayner, Julian C. ;
Choudhary, Jyoti S. .
JOURNAL OF PROTEOMICS, 2014, 103 :1-14
[6]   Open source system for analyzing, validating, and storing protein identification data [J].
Craig, R ;
Cortens, JP ;
Beavis, RC .
JOURNAL OF PROTEOME RESEARCH, 2004, 3 (06) :1234-1242
[7]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[8]   Pepitome: Evaluating Improved Spectral Library Search for Identification Complementarity and Quality Assessment [J].
Dasari, Surendra ;
Chambers, Matthew C. ;
Martinez, Misti A. ;
Carpenter, Kristin L. ;
Ham, Amy-Joan L. ;
Vega-Montoto, Lorenzo J. ;
Tabb, David L. .
JOURNAL OF PROTEOME RESEARCH, 2012, 11 (03) :1686-1695
[9]  
Desiere F, 2006, NUCLEIC ACIDS RES, V34, pD655, DOI [10.1093/nar/gkj040, 10.1007/978-1-60761-444-9_19]
[10]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989