Fast and Accurate Protein False Discovery Rates on Large-Scale Proteomics Data Sets with Percolator 3.0

被引:259
作者
The, Matthew [1 ]
MacCoss, Michael J. [2 ]
Noble, William S. [2 ,3 ]
Kall, Lukas [1 ]
机构
[1] KTH Royal Inst Technol, Sci Life Lab, Sch Biotechnol, Box 1031, S-17121 Solna, Sweden
[2] Univ Washington, Sch Med, Dept Genome Sci, Seattle, WA 98195 USA
[3] Univ Washington, Dept Comp Sci & Engn, Seattle, WA 98195 USA
基金
美国国家卫生研究院;
关键词
Mass spectrometry - LC-MS/MS; Statistical analysis; Data processing and analysis; Protein inference; Large scale studies; TANDEM MASS-SPECTROMETRY; SHOTGUN PROTEOMICS; PEPTIDE IDENTIFICATION; SPECTRA; PROBABILITIES; DATABASES; INFERENCE; STRIKE;
D O I
10.1007/s13361-016-1460-7
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Percolator is a widely used software tool that increases yield in shotgun proteomics experiments and assigns reliable statistical confidence measures, such as q values and posterior error probabilities, to peptides and peptide-spectrum matches ( PSMs) from such experiments. Percolator's processing speed has been sufficient for typical data sets consisting of hundreds of thousands of PSMs. With our new scalable approach, we can now also analyze millions of PSMs in a matter of minutes on a commodity computer. Furthermore, with the increasing awareness for the need for reliable statistics on the protein level, we compared several easy-to-understand protein inference methods and implemented the best-performing method-grouping proteins by their corresponding sets of theoretical peptides and then considering only the best-scoring peptide for each protein-in the Percolator package. We used Percolator 3.0 to analyze the data from a recent study of the draft human proteome containing 25 million spectra (PM:24870542). The source code and Ubuntu, Windows, MacOS, and Fedora binary packages are available from http://percolator.ms/under an Apache 2.0 license.
引用
收藏
页码:1719 / 1727
页数:9
相关论文
共 31 条
  • [21] Increased Confidence in Large-Scale Phosphoproteomics Data by Complementary Mass Spectrometric Techniques and Matching of Phosphopeptide Data Sets
    Alcolea, Maria P.
    Kleiner, Oliver
    Cutillas, Pedro R.
    JOURNAL OF PROTEOME RESEARCH, 2009, 8 (08) : 3808 - 3815
  • [22] Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics
    Deutsch, Eric W.
    Mendoza, Luis
    Shteynberg, David
    Slagel, Joseph
    Sun, Zhi
    Moritz, Robert L.
    PROTEOMICS CLINICAL APPLICATIONS, 2015, 9 (7-8) : 745 - 754
  • [23] A Fast and Simple Method for Detecting Identity-by-Descent Segments in Large-Scale Data
    Zhou, Ying
    Browning, Sharon R.
    Browning, Brian L.
    AMERICAN JOURNAL OF HUMAN GENETICS, 2020, 106 (04) : 426 - 437
  • [24] Discovery of O-GlcNAc-6-phosphate Modified Proteins in Large-scale Phosphoproteomics Data
    Hahne, Hannes
    Kuster, Bernhard
    MOLECULAR & CELLULAR PROTEOMICS, 2012, 11 (10) : 1063 - 1069
  • [25] FVGWAS: Fast voxelwise genome wide association analysis of large-scale imaging genetic data
    Huang, Meiyan
    Nichols, Thomas
    Huang, Chao
    Yu, Yang
    Lu, Zhaohua
    Knickmeyer, Rebecca C.
    Feng, Qianjin
    Zhu, Hongtu
    NEUROIMAGE, 2015, 118 : 613 - 627
  • [26] Re-Fraction: A Machine Learning Approach for Deterministic Identification of Protein Homologues and Splice Variants in Large-scale MS-based Proteomics
    Yang, Pengyi
    Humphrey, Sean J.
    Fazakerley, Daniel J.
    Prior, Matthew J.
    Yang, Guang
    James, David E.
    Yang, Jean Yee-Hwa
    JOURNAL OF PROTEOME RESEARCH, 2012, 11 (05) : 3035 - 3045
  • [27] A Distributed Framework for Large-scale Protein-protein Interaction Data Analysis and Prediction Using MapReduce
    Hu, Lun
    Yang, Shicheng
    Luo, Xin
    Yuan, Huaqiang
    Sedraoui, Khaled
    Zhou, MengChu
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (01) : 160 - 172
  • [28] New Glycoproteomics Software, GlycoPep Evaluator, Generates Decoy Glycopeptides de Novo and Enables Accurate False Discovery Rate Analysis for Small Data Sets
    Zhu, Zhikai
    Su, Xiaomeng
    Go, Eden P.
    Desaire, Heather
    ANALYTICAL CHEMISTRY, 2014, 86 (18) : 9212 - 9219
  • [29] Elucidation of Signaling Pathways from Large-Scale Phosphoproteomic Data Using Protein Interaction Networks
    Rudolph, Jan Daniel
    de Graauw, Marjo
    van de Water, Bob
    Geiger, Tamar
    Sharan, Roded
    CELL SYSTEMS, 2016, 3 (06) : 585 - +
  • [30] Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy
    Huttlin, Edward L.
    Hegeman, Adrian D.
    Harms, Amy C.
    Sussman, Michael R.
    JOURNAL OF PROTEOME RESEARCH, 2007, 6 (01) : 392 - 398