Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial

被引:68
作者
Cuklina, Jelena [1 ,2 ,3 ,4 ]
Lee, Chloe H. [1 ]
Williams, Evan G. [1 ,5 ]
Sajic, Tatjana [1 ]
Collins, Ben C. [1 ,6 ]
Martinez, Maria Rodriguez [4 ]
Sharma, Varun S. [1 ]
Wendt, Fabian [7 ]
Goetze, Sandra [7 ,8 ,9 ]
Keele, Gregory R. [10 ]
Wollscheid, Bernd [7 ,8 ,9 ]
Aebersold, Ruedi [1 ,11 ]
Pedrioli, Patrick G. A. [1 ,7 ,8 ,9 ]
机构
[1] Swiss Fed Inst Technol, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland
[2] Univ Zurich, PhD Program Syst Biol, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] IBM Res Europe, Ruschlikon, Switzerland
[5] Univ Luxembourg, Luxembourg Ctr Syst Biomed, Luxembourg, Luxembourg
[6] Queens Univ Belfast, Belfast, Antrim, North Ireland
[7] Swiss Fed Inst Technol, Dept Hlth Sci & Technol, Inst Translat Med, Zurich, Switzerland
[8] Swiss Fed Inst Technol, PHRT CPAC, Zurich, Switzerland
[9] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[10] Jackson Lab, 600 Main St, Bar Harbor, ME 04609 USA
[11] Univ Zurich, Fac Sci, Zurich, Switzerland
基金
瑞士国家科学基金会; 欧洲研究理事会;
关键词
batch effects; data analysis; large-scale proteomics; normalization; quantitative proteomics; NORMALIZATION METHODS; MASS-SPECTROMETRY; GENE-EXPRESSION; PROTEOGENOMIC CHARACTERIZATION; STATISTICAL-ANALYSIS; OMICS DATA; R-PACKAGE; PLATFORM; DESIGN;
D O I
10.15252/msb.202110240
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
引用
收藏
页数:16
相关论文
共 58 条
[1]   On the design and analysis of gene expression studies in human populations [J].
Akey, Joshua M. ;
Biswas, Shameek ;
Leek, Jeffrey T. ;
Storey, John D. .
NATURE GENETICS, 2007, 39 (07) :807-808
[2]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[3]  
[Anonymous], 2007, Mathematical statistics and data analysis
[4]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[5]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[6]   Multibatch TMT Reveals False Positives, Batch Effects and Missing Values [J].
Brenes, Alejandro ;
Hukelmann, Ens ;
Bensaddek, Dalila ;
Lamond, Angus, I .
MOLECULAR & CELLULAR PROTEOMICS, 2019, 18 (10) :1967-1980
[7]   Normalyzer: A Tool for Rapid Evaluation of Normalization Methods for Omics Data Sets [J].
Chawade, Aakash ;
Alexandersson, Erik ;
Levander, Fredrik .
JOURNAL OF PROTEOME RESEARCH, 2014, 13 (06) :3114-3120
[8]   Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods [J].
Chen, Chao ;
Grennan, Kay ;
Badner, Judith ;
Zhang, Dandan ;
Gershon, Elliot ;
Jin, Li ;
Liu, Chunyu .
PLOS ONE, 2011, 6 (02)
[9]   MSstats: an R package for statistical analysis of quantitative mass spectrometry-based proteomic experiments [J].
Choi, Meena ;
Chang, Ching-Yun ;
Clough, Timothy ;
Broudy, Daniel ;
Killeen, Trevor ;
MacLean, Brendan ;
Vitek, Olga .
BIOINFORMATICS, 2014, 30 (17) :2524-2526
[10]   Statistical protein quantification and significance analysis in label-free LC-MS experiments with complex designs [J].
Clough, Timothy ;
Thaminy, Safia ;
Ragg, Susanne ;
Aebersold, Ruedi ;
Vitek, Olga .
BMC BIOINFORMATICS, 2012, 13 :S6