Detecting differential and correlated protein expression in label-free shotgun proteomics

被引:298
作者
Zhang, Bing
VerBerkmoes, Nathan C.
Langston, Michael A.
Uberbacher, Edward
Hettich, Robert L.
Samatova, Nagiza F.
机构
[1] Oak Ridge Natl Lab, Comp Sci & Math Div, Oak Ridge, TN 37831 USA
[2] Oak Ridge Natl Lab, Computat Biol Inst, Oak Ridge, TN 37831 USA
[3] Oak Ridge Natl Lab, Div Chem Sci, Oak Ridge, TN 37831 USA
[4] Oak Ridge Natl Lab, Div Life Sci, Oak Ridge, TN 37831 USA
[5] Univ Tennessee, Dept Comp Sci, Knoxville, TN 37996 USA
关键词
label-free; LC-MS/MS; shotgun proteomics; differential expression; correlated expression; clustering; Saccharomyces cerevisiae; Rhodopseudomonas palustris;
D O I
10.1021/pr0600273
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Recent studies have revealed a relationship between protein abundance and sampling statistics, such as sequence coverage, peptide count, and spectral count, in label-free liquid chromatography-tandem mass spectrometry (LC-MS/MS) shotgun proteomics. The use of sampling statistics offers a promising method of measuring relative protein abundance and detecting differentially expressed or coexpressed proteins. We performed a systematic analysis of various approaches to quantifying differential protein expression in eukaryotic Saccharomyces cerevisiae and prokaryotic Rhodopseudomonas palustris label-free LC-MS/MS data. First, we showed that, among three sampling statistics, the spectral count has the highest technical reproducibility, followed by the less-reproducible peptide count and relatively nonreproducible sequence coverage. Second, we used spectral count statistics to measure differential protein expression in pairwise experiments using five statistical tests: Fisher's exact test, G-test, AC test, t-test, and LPE test. Given the S. cerevisiae data set with spiked proteins as a benchmark and the false positive rate as a metric, our evaluation suggested that the Fisher's exact test, G-test, and AC test can be used when the number of replications is limited (one or two), whereas the t-test is useful with three or more replicates available. Third, we generalized the G-test to increase the sensitivity of detecting differential protein expression under multiple experimental conditions. Out of 1622 identified R. palustris proteins in the LC-MS/MS experiment, the generalized G-test detected 1119 differentially expressed proteins under six growth conditions. Finally, we studied correlated expression of these 1119 proteins by analyzing pairwise expression correlations and by delineating protein clusters according to expression patterns. Through pairwise expression correlation analysis, we demonstrated that proteins co-located in the same operon were much more strongly coexpressed than those from different operons. Combining cluster analysis with existing protein functional annotations, we identified six protein clusters with known biological significance. In summary, the proposed generalized G-test using spectral count sampling statistics is a viable methodology for robust quantification of relative protein abundance and for sensitive detection of biologically significant differential protein expression under multiple experimental conditions in label-free shotgun proteomics.
引用
收藏
页码:2909 / 2918
页数:10
相关论文
共 35 条
[1]   Mass spectrometry-based proteomics [J].
Aebersold, R ;
Mann, M .
NATURE, 2003, 422 (6928) :198-207
[2]   The significance of digital gene expression profiles [J].
Audic, S ;
Claverie, JM .
GENOME RESEARCH, 1997, 7 (10) :986-995
[3]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[4]   Metagenes and molecular pattern discovery using matrix factorization [J].
Brunet, JP ;
Tamayo, P ;
Golub, TR ;
Mesirov, JP .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 (12) :4164-4169
[5]   Differential proteomics via probabilistic peptide identification scores [J].
Colinge, J ;
Chiappe, D ;
Lagache, S ;
Moniatte, M ;
Bougueleret, L .
ANALYTICAL CHEMISTRY, 2005, 77 (02) :596-606
[6]   Multiple hypothesis testing in microarray experiments [J].
Dudoit, S ;
Shaffer, JP ;
Boldrick, JC .
STATISTICAL SCIENCE, 2003, 18 (01) :71-103
[7]   AN APPROACH TO CORRELATE TANDEM MASS-SPECTRAL DATA OF PEPTIDES WITH AMINO-ACID-SEQUENCES IN A PROTEIN DATABASE [J].
ENG, JK ;
MCCORMACK, AL ;
YATES, JR .
JOURNAL OF THE AMERICAN SOCIETY FOR MASS SPECTROMETRY, 1994, 5 (11) :976-989
[8]  
Fisher R. A., 1925, STAT METHODS RES WOR
[9]   A proteomic view of the Plasmodium falciparum life cycle [J].
Florens, L ;
Washburn, MP ;
Raine, JD ;
Anthony, RM ;
Grainger, M ;
Haynes, JD ;
Moch, JK ;
Muster, N ;
Sacci, JB ;
Tabb, DL ;
Witney, AA ;
Wolters, D ;
Wu, YM ;
Gardner, MJ ;
Holder, AA ;
Sinden, RE ;
Yates, JR ;
Carucci, DJ .
NATURE, 2002, 419 (6906) :520-526
[10]   Changes in the protein expression of yeast as a function of carbon source [J].
Gao, J ;
Opiteck, GJ ;
Friedrichs, MS ;
Dongre, AR ;
Hefta, SA .
JOURNAL OF PROTEOME RESEARCH, 2003, 2 (06) :643-649