Transfer posterior error probability estimation for peptide identification

被引:11
作者
Yi, Xinpei [1 ,2 ]
Gong, Fuzhou [1 ,2 ]
Fu, Yan [1 ,2 ]
机构
[1] Chinese Acad Sci, Natl Ctr Math & Interdisciplinary Sci, Key Lab Random Complex Struct & Data Sci, Acad Math & Syst Sci, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci, Sch Math Sci, Beijing 100049, Peoples R China
基金
国家重点研发计划;
关键词
Proteomics; Mass spectrometry; Quality control; Posterior error probability; Local false discovery rate; Transfer learning; FALSE DISCOVERY RATES; POSTTRANSLATIONAL MODIFICATIONS; MASS SPECTROMETRISTS; SEARCH STRATEGY; PROTEOMIC DATA; VALIDATION; INFERENCE; MODEL;
D O I
10.1186/s12859-020-3485-y
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background In shotgun proteomics, database searching of tandem mass spectra results in a great number of peptide-spectrum matches (PSMs), many of which are false positives. Quality control of PSMs is a multiple hypothesis testing problem, and the false discovery rate (FDR) or the posterior error probability (PEP) is the commonly used statistical confidence measure. PEP, also called local FDR, can evaluate the confidence of individual PSMs and thus is more desirable than FDR, which evaluates the global confidence of a collection of PSMs. Estimation of PEP can be achieved by decomposing the null and alternative distributions of PSM scores as long as the given data is sufficient. However, in many proteomic studies, only a group (subset) of PSMs, e.g. those with specific post-translational modifications, are of interest. The group can be very small, making the direct PEP estimation by the group data inaccurate, especially for the high-score area where the score threshold is taken. Using the whole set of PSMs to estimate the group PEP is inappropriate either, because the null and/or alternative distributions of the group can be very different from those of combined scores. Results The transfer PEP algorithm is proposed to more accurately estimate the PEPs of peptide identifications in small groups. Transfer PEP derives the group null distribution through its empirical relationship with the combined null distribution, and estimates the group alternative distribution, as well as the null proportion, using an iterative semi-parametric method. Validated on both simulated data and real proteomic data, transfer PEP showed remarkably higher accuracy than the direct combined and separate PEP estimation methods. Conclusions We presented a novel approach to group PEP estimation for small groups and implemented it for the peptide identification problem in proteomics. The methodology of the approach is in principle applicable to the small-group PEP estimation problems in other fields.
引用
收藏
页数:17
相关论文
共 31 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   PTMiner: Localization and Quality Control of Protein Modifications Detected in an Open Search and Its Application to Comprehensive Post-translational Modification Characterization in Human Proteome [J].
An, Zhiwu ;
Zhai, Linhui ;
Ying, Wantao ;
Qian, Xiaohong ;
Gong, Fuzhou ;
Tan, Minjia ;
Fu, Yan .
MOLECULAR & CELLULAR PROTEOMICS, 2019, 18 (02) :391-405
[3]  
[Anonymous], 2012, BMC bioinformatics
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   False discovery rates and related statistical concepts in mass spectrometry-based proteomics [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :47-50
[6]   Semisupervised model-based validation of peptide identifications in mass spectrometry-based proteomics [J].
Choi, Hyungwon ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :254-265
[7]   Statistical validation of peptide identifications in large-scale proteomics using the target-decoy database search strategy and flexible mixture modeling [J].
Choi, Hyungwon ;
Ghosh, Debashis ;
Nesvizhskii, Alexey I. .
JOURNAL OF PROTEOME RESEARCH, 2008, 7 (01) :286-292
[8]   Empirical Bayes methods and false discovery rates for microarrays [J].
Efron, B ;
Tibshirani, R .
GENETIC EPIDEMIOLOGY, 2002, 23 (01) :70-86
[9]   SIMULTANEOUS INFERENCE: WHEN SHOULD HYPOTHESIS TESTING PROBLEMS BE COMBINED? [J].
Efron, Bradley .
ANNALS OF APPLIED STATISTICS, 2008, 2 (01) :197-223
[10]   Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry [J].
Elias, Joshua E. ;
Gygi, Steven P. .
NATURE METHODS, 2007, 4 (03) :207-214