Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs

被引:96
作者
Clough, Timothy [1 ]
Thaminy, Safia [2 ,3 ]
Ragg, Susanne [4 ]
Aebersold, Ruedi [2 ,5 ]
Vitek, Olga [1 ,6 ]
机构
[1] Purdue Univ, Dept Stat, W Lafayette, IN 47907 USA
[2] Swiss Fed Inst Technol, Inst Mol Syst Biol, Dept Biol, Zurich, Switzerland
[3] Inst Syst Biol, Seattle, WA USA
[4] Indiana Univ, Sch Med, Indianapolis, IN USA
[5] Univ Zurich, Fac Sci, CH-8006 Zurich, Switzerland
[6] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
来源
BMC BIOINFORMATICS | 2012年 / 13卷
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
SPECTROMETRY-BASED PROTEOMICS; MASS-SPECTROMETRY; QUANTITATIVE PROTEOMICS; IDENTIFICATION; NORMALIZATION; EXPRESSION; ABUNDANCE; MODEL;
D O I
10.1186/1471-2105-13-S16-S6
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. Results: We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. Conclusions: We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/similar to ovitek/Software.html.
引用
收藏
页数:17
相关论文
共 44 条
  • [1] [Anonymous], 2000, DESIGN ANAL EXPT
  • [2] [Anonymous], 1973, Pattern Classification and Scene Analysis
  • [3] Bates D.M., 2010, Lme4: mixed-effects modeling with R
  • [4] CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING
    BENJAMINI, Y
    HOCHBERG, Y
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) : 289 - 300
  • [5] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [6] Bukhman Yury V., 2008, Journal of Bioinformatics and Computational Biology, V6, P107, DOI 10.1142/S0219720008003321
  • [7] Chang CY, 2012, MOL CELLULAR PROTEOM, V11, P273
  • [8] REGRESSION BY LOCAL FITTING - METHODS, PROPERTIES, AND COMPUTATIONAL ALGORITHMS
    CLEVELAND, WS
    DEVLIN, SJ
    GROSSE, E
    [J]. JOURNAL OF ECONOMETRICS, 1988, 37 (01) : 87 - 114
  • [9] Cleveland WS, 1993, VISUALIZING DATA
  • [10] Protein Quantification in Label-Free LC-MS Experiments
    Clough, Timothy
    Key, Melissa
    Ott, Ilka
    Ragg, Susanne
    Schadow, Gunther
    Vitek, Olga
    [J]. JOURNAL OF PROTEOME RESEARCH, 2009, 8 (11) : 5275 - 5284