Combination of Multiple Spectral Libraries Improves the Current Search Methods Used to Identify Missing Proteins in the Chromosome-Centric Human Proteome Project

被引:14
作者
Cho, Jin-Young [1 ,2 ]
Lee, Hyoung-Joo [1 ,2 ]
Jeong, Seul-Ki [1 ,2 ]
Kim, Kwang-Youl [1 ,2 ]
Kwon, Kyung-Hoon [3 ]
Yoo, Jong Shin [3 ]
Omenn, Gilbert S. [4 ]
Baker, Mark S. [5 ]
Hancock, William S. [6 ]
Paik, Young-Ki [1 ,2 ]
机构
[1] Yonsei Univ, Yonsei Proteome Res Ctr, Dept Integrated OMICS Biomed Sci, Seoul 120749, South Korea
[2] Yonsei Univ, Dept Biochem, Coll Life Sci & Biotechnol, Seoul 120749, South Korea
[3] Korea Basic Sci Inst, Ochang, South Korea
[4] Univ Michigan, Ctr Computat Med & Bioinformat, Ann Arbor, MI 48109 USA
[5] Macquarie Univ, Dept Biomed Sci, Fac Med & Hlth Sci, Sydney, NSW 2109, Australia
[6] Northeastern Univ, Boston, MA 02115 USA
基金
新加坡国家研究基金会;
关键词
Chromosome-Centric Human Proteome Project; proteomics; missing protein; spectral library search; INDUCED DISSOCIATION SPECTRA; HIGH-THROUGHPUT PROTEOMICS; TANDEM MASS-SPECTRA; PEPTIDE IDENTIFICATION; DATABASE SEARCH; SPECTROMETRY; MS/MS; STRATEGIES; GENOME; PREDICTION;
D O I
10.1021/acs.jproteome.5b00578
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Approximately 2.9 billion long base-pair human reference genome sequences are known to encode some 20 000 representative proteins. However, 3000 proteins, that is, similar to 5% of all proteins, have no or very weak proteomic evidence and are still missing. Missing proteins may be present in rare samples in very low abundance or be only temporarily expressed, causing problems in their detection and protein profiling. In particular, some technical limitations cause missing proteins to remain unassigned. For example, current mass spectrometry techniques have high limits and error rates for the detection of complex biological samples. An insufficient proteome coverage in a reference sequence database and spectral library also raises major issues. Thus, the development of a better strategy that results in greater sensitivity and accuracy in the search for missing proteins is necessary. To this end, we used a new strategy, which combines a reference spectral library search and a simulated spectral library search, to identify missing proteins. We built the human iRefSPL, which contains the original human reference spectral library and additional peptide sequence-spectrum match entries from other species. We also constructed the human simSPL, which contains the simulated spectra of 173 907 human tryptic peptides determined by MassAnalyzer (version 2.3.1). To prove the enhanced analytical performance of the combination of the human iRefSPL and simSPL methods for the identification of missing proteins, we attempted to reanalyze the placental tissue data set (PXD000754). The data from each experiment were analyzed using PeptideProphet, and the results were combined using iProphet. For the quality control, we applied the class-specific false-discovery rate filtering method. All of the results were filtered at a false-discovery rate of <1% at the peptide and protein levels. The quality-controlled results were then cross-checked with the neXtProt DB (2014-09-19 release). The two spectral libraries, iRefSPL and simSPL, were designed to ensure no overlap of the proteome coverage. They were shown to be complementary to spectral library searching and significantly increased the number of matches. From this trial, 12 new missing proteins were identified that passed the following criterion: at least 2 peptides of 7 or more amino acids in length or one of 9 or more amino acids in length with one or more unique sequences. Thus, the iRefSPL and simSPL combination can be used to help identify peptides that have not been detected by conventional sequence low error rate.
引用
收藏
页码:4959 / 4966
页数:8
相关论文
共 44 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   Critical assessment of proteome-wide label-free absolute abundance estimation strategies [J].
Ahrne, Erik ;
Molzahn, Lars ;
Glatter, Timo ;
Schmidt, Alexander .
PROTEOMICS, 2013, 13 (17) :2567-2578
[3]   Mass spectrometry: Bottom-up or top-down? [J].
Chait, Brian T. .
SCIENCE, 2006, 314 (5796) :65-66
[4]   Using annotated peptide mass spectrum libraries for protein identification [J].
Craig, R. ;
Cortens, J. C. ;
Fenyo, D. ;
Beavis, R. C. .
JOURNAL OF PROTEOME RESEARCH, 2006, 5 (08) :1843-1849
[5]   TANDEM: matching proteins with tandem mass spectra [J].
Craig, R ;
Beavis, RC .
BIOINFORMATICS, 2004, 20 (09) :1466-1467
[6]  
Desiere F, 2005, GENOME BIOL, V6
[7]   New frontiers in proteomics research: A perspective [J].
Dhingra, V ;
Gupta, M ;
Andacht, T ;
Fu, ZF .
INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2005, 299 (1-2) :1-18
[8]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[9]   A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes [J].
Fenyö, D ;
Beavis, RC .
ANALYTICAL CHEMISTRY, 2003, 75 (04) :768-774
[10]   Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries [J].
Frewen, Barbara E. ;
Merrihew, Gennifer E. ;
Wu, Christine C. ;
Noble, William Stafford ;
MacCoss, Michael J. .
ANALYTICAL CHEMISTRY, 2006, 78 (16) :5678-5684