Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial

被引:62
|
作者
Cuklina, Jelena [1 ,2 ,3 ,4 ]
Lee, Chloe H. [1 ]
Williams, Evan G. [1 ,5 ]
Sajic, Tatjana [1 ]
Collins, Ben C. [1 ,6 ]
Martinez, Maria Rodriguez [4 ]
Sharma, Varun S. [1 ]
Wendt, Fabian [7 ]
Goetze, Sandra [7 ,8 ,9 ]
Keele, Gregory R. [10 ]
Wollscheid, Bernd [7 ,8 ,9 ]
Aebersold, Ruedi [1 ,11 ]
Pedrioli, Patrick G. A. [1 ,7 ,8 ,9 ]
机构
[1] Swiss Fed Inst Technol, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland
[2] Univ Zurich, PhD Program Syst Biol, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] IBM Res Europe, Ruschlikon, Switzerland
[5] Univ Luxembourg, Luxembourg Ctr Syst Biomed, Luxembourg, Luxembourg
[6] Queens Univ Belfast, Belfast, Antrim, North Ireland
[7] Swiss Fed Inst Technol, Dept Hlth Sci & Technol, Inst Translat Med, Zurich, Switzerland
[8] Swiss Fed Inst Technol, PHRT CPAC, Zurich, Switzerland
[9] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[10] Jackson Lab, 600 Main St, Bar Harbor, ME 04609 USA
[11] Univ Zurich, Fac Sci, Zurich, Switzerland
基金
欧洲研究理事会; 瑞士国家科学基金会;
关键词
batch effects; data analysis; large-scale proteomics; normalization; quantitative proteomics; NORMALIZATION METHODS; MASS-SPECTROMETRY; GENE-EXPRESSION; PROTEOGENOMIC CHARACTERIZATION; STATISTICAL-ANALYSIS; OMICS DATA; R-PACKAGE; PLATFORM; DESIGN;
D O I
10.15252/msb.202110240
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Identification of Novel RasGRF1 Interacting Partners by Large-Scale Proteomic Analysis
    Lavagni, Paola
    Indrigo, Marzia
    Colombo, Graziano
    Martegani, Enzo
    Rosenblum, Kobi
    Gnesutta, Nerina
    Zippel, Renata
    JOURNAL OF MOLECULAR NEUROSCIENCE, 2009, 37 (03) : 212 - 224
  • [23] Flexible non-linear predictive models for large-scale wind turbine diagnostics
    Bach-Andersen, Martin
    Romer-Odgaard, Bo
    Winther, Ole
    WIND ENERGY, 2017, 20 (05) : 753 - 764
  • [24] Bayesian copy number detection and association in large-scale studies
    Stephen Cristiano
    David McKean
    Jacob Carey
    Paige Bracci
    Paul Brennan
    Michael Chou
    Mengmeng Du
    Steven Gallinger
    Michael G. Goggins
    Manal M. Hassan
    Rayjean J. Hung
    Robert C. Kurtz
    Donghui Li
    Lingeng Lu
    Rachel Neale
    Sara Olson
    Gloria Petersen
    Kari G. Rabe
    Jack Fu
    Harvey Risch
    Gary L. Rosner
    Ingo Ruczinski
    Alison P. Klein
    Robert B. Scharpf
    BMC Cancer, 20
  • [25] Large-scale screening studies for atrial fibrillation - is it worth the effort?
    Engdahl, J.
    Rosenqvist, M.
    JOURNAL OF INTERNAL MEDICINE, 2021, 289 (04) : 474 - 492
  • [26] Bayesian copy number detection and association in large-scale studies
    Cristiano, Stephen
    McKean, David
    Carey, Jacob
    Bracci, Paige
    Brennan, Paul
    Chou, Michael
    Du, Mengmeng
    Gallinger, Steven
    Goggins, Michael G.
    Hassan, Manal M.
    Hung, Rayjean J.
    Kurtz, Robert C.
    Li, Donghui
    Lu, Lingeng
    Neale, Rachel
    Olson, Sara
    Petersen, Gloria
    Rabe, Kari G.
    Fu, Jack
    Risch, Harvey
    Rosner, Gary L.
    Ruczinski, Ingo
    Klein, Alison P.
    Scharpf, Robert B.
    BMC CANCER, 2020, 20 (01)
  • [27] Studies on a novel and large-scale proteomics method and its application
    Zhang, YS
    Shi, R
    Meng, QF
    Wang, JL
    Cai, Y
    Zhu, YP
    He, FC
    Qian, XH
    CHINESE JOURNAL OF ANALYTICAL CHEMISTRY, 2005, 33 (10) : 1371 - 1375
  • [28] Updating JPROT's publication standards for large-scale proteomic studies: Towards hypothesis-driven interpretation of predictive biological models
    Calvete, Juan J.
    JOURNAL OF PROTEOMICS, 2012, 76 : 1 - 2
  • [29] Large-Scale Quantitative Proteomic Analysis during Different Stages of Somatic Embryogenesis in Larix olgensis
    Hou, Jiayin
    Wang, Xuechun
    Liu, Weifeng
    Jiang, Xiangning
    Gai, Ying
    CURRENT ISSUES IN MOLECULAR BIOLOGY, 2023, 45 (03) : 2021 - 2034
  • [30] ISSUES IN SOLVING LARGE-SCALE PLANNING, DESIGN AND SCHEDULING PROBLEMS IN BATCH CHEMICAL-PLANTS
    SUBRAHMANYAM, S
    BASSETT, MH
    PEKNY, JF
    REKLAITIS, GV
    COMPUTERS & CHEMICAL ENGINEERING, 1995, 19 : S577 - S582