Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0

被引:281
作者
The, Matthew [1 ]
MacCoss, Michael J. [2 ]
Noble, William S. [2 ,3 ]
Kall, Lukas [1 ]
机构
[1] KTH Royal Inst Technol, Sci Life Lab, Sch Biotechnol, Box 1031, S-17121 Solna, Sweden
[2] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
Mass spectrometry - LC-MS/MS; Statistical analysis; Data processing and analysis; Protein inference; Large scale studies; TANDEM MASS-SPECTROMETRY; SHOTGUN PROTEOMICS; PEPTIDE IDENTIFICATION; SPECTRA; PROBABILITIES; DATABASES; INFERENCE; STRIKE;
D O I
10.1007/s13361-016-1460-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches ( PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/under an Apache 2.0 license.
引用
收藏
页码:1719 / 1727
页数:9
相关论文
共 31 条
[21]   A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data [J].
Zhou, Ying ;
Browning, Sharon R. ;
Browning, Brian L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2020, 106 (04) :426-437
[22]   Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics [J].
Deutsch, Eric W. ;
Mendoza, Luis ;
Shteynberg, David ;
Slagel, Joseph ;
Sun, Zhi ;
Moritz, Robert L. .
PROTEOMICS CLINICAL APPLICATIONS, 2015, 9 (7-8) :745-754
[23]   Increased Confidence in Large-Scale Phosphoproteomics Data by Complementary Mass Spectrometric Techniques and Matching of Phosphopeptide Data Sets [J].
Alcolea, Maria P. ;
Kleiner, Oliver ;
Cutillas, Pedro R. .
JOURNAL OF PROTEOME RESEARCH, 2009, 8 (08) :3808-3815
[24]   Discovery of O-GlcNAc-6-phosphate Modified Proteins in Large-scale Phosphoproteomics Data [J].
Hahne, Hannes ;
Kuster, Bernhard .
MOLECULAR & CELLULAR PROTEOMICS, 2012, 11 (10) :1063-1069
[25]   FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data [J].
Huang, Meiyan ;
Nichols, Thomas ;
Huang, Chao ;
Yu, Yang ;
Lu, Zhaohua ;
Knickmeyer, Rebecca C. ;
Feng, Qianjin ;
Zhu, Hongtu .
NEUROIMAGE, 2015, 118 :613-627
[26]   Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics [J].
Yang, Pengyi ;
Humphrey, Sean J. ;
Fazakerley, Daniel J. ;
Prior, Matthew J. ;
Yang, Guang ;
James, David E. ;
Yang, Jean Yee-Hwa .
JOURNAL OF PROTEOME RESEARCH, 2012, 11 (05) :3035-3045
[27]   A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce [J].
Hu, Lun ;
Yang, Shicheng ;
Luo, Xin ;
Yuan, Huaqiang ;
Sedraoui, Khaled ;
Zhou, MengChu .
IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (01) :160-172
[28]   New Glycoproteomics Software, GlycoPep Evaluator, Generates Decoy Glycopeptides de Novo and Enables Accurate False Discovery Rate Analysis for Small Data Sets [J].
Zhu, Zhikai ;
Su, Xiaomeng ;
Go, Eden P. ;
Desaire, Heather .
ANALYTICAL CHEMISTRY, 2014, 86 (18) :9212-9219
[29]   Elucidation of Signaling Pathways from Large-Scale Phosphoproteomic Data Using Protein Interaction Networks [J].
Rudolph, Jan Daniel ;
de Graauw, Marjo ;
van de Water, Bob ;
Geiger, Tamar ;
Sharan, Roded .
CELL SYSTEMS, 2016, 3 (06) :585-+
[30]   Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy [J].
Huttlin, Edward L. ;
Hegeman, Adrian D. ;
Harms, Amy C. ;
Sussman, Michael R. .
JOURNAL OF PROTEOME RESEARCH, 2007, 6 (01) :392-398