Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial

被引:62
作者
Cuklina, Jelena [1 ,2 ,3 ,4 ]
Lee, Chloe H. [1 ]
Williams, Evan G. [1 ,5 ]
Sajic, Tatjana [1 ]
Collins, Ben C. [1 ,6 ]
Martinez, Maria Rodriguez [4 ]
Sharma, Varun S. [1 ]
Wendt, Fabian [7 ]
Goetze, Sandra [7 ,8 ,9 ]
Keele, Gregory R. [10 ]
Wollscheid, Bernd [7 ,8 ,9 ]
Aebersold, Ruedi [1 ,11 ]
Pedrioli, Patrick G. A. [1 ,7 ,8 ,9 ]
机构
[1] Swiss Fed Inst Technol, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland
[2] Univ Zurich, PhD Program Syst Biol, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] IBM Res Europe, Ruschlikon, Switzerland
[5] Univ Luxembourg, Luxembourg Ctr Syst Biomed, Luxembourg, Luxembourg
[6] Queens Univ Belfast, Belfast, Antrim, North Ireland
[7] Swiss Fed Inst Technol, Dept Hlth Sci & Technol, Inst Translat Med, Zurich, Switzerland
[8] Swiss Fed Inst Technol, PHRT CPAC, Zurich, Switzerland
[9] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[10] Jackson Lab, 600 Main St, Bar Harbor, ME 04609 USA
[11] Univ Zurich, Fac Sci, Zurich, Switzerland
基金
瑞士国家科学基金会; 欧洲研究理事会;
关键词
batch effects; data analysis; large-scale proteomics; normalization; quantitative proteomics; NORMALIZATION METHODS; MASS-SPECTROMETRY; GENE-EXPRESSION; PROTEOGENOMIC CHARACTERIZATION; STATISTICAL-ANALYSIS; OMICS DATA; R-PACKAGE; PLATFORM; DESIGN;
D O I
10.15252/msb.202110240
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
引用
收藏
页数:16
相关论文
共 50 条
  • [41] Large-scale non-targeted metabolomic profiling in three human population-based studies
    Ganna, Andrea
    Fall, Tove
    Salihovic, Samira
    Lee, Woojoo
    Broeckling, Corey D.
    Kumar, Jitender
    Hagg, Sara
    Stenemo, Markus
    Magnusson, Patrik K. E.
    Prenni, Jessica E.
    Lind, Lars
    Pawitan, Yudi
    Ingelsson, Erik
    METABOLOMICS, 2016, 12 (01) : 1 - 13
  • [42] Quantitative approaches to study phenotypic effects of large-scale genetic perturbations
    Mueller, Janina
    Bollenbach, Tobias
    CURRENT OPINION IN MICROBIOLOGY, 2023, 74
  • [43] Large-Scale Modeling of Absorbing Aerosols and Their Semi-Direct Effects
    Tegen, Ina
    Heinold, Bernd
    ATMOSPHERE, 2018, 9 (10)
  • [45] Coding SNPs as intrinsic markers for sample tracking in large-scale transcriptome studies
    Xu, Weihong
    Gao, Hong
    Seok, Junhee
    Wilhelmy, Julie
    Mindrinos, Michael N.
    Davis, Ronald W.
    Xiao, Wenzhong
    BIOTECHNIQUES, 2012, 52 (06) : 386 - 388
  • [46] Large-Scale "OMICS" Studies to Explore the Physiopatholgy of HIV-1 Infection
    Le Clerc, Sigrid
    Limou, Sophie
    Zagury, Jean-Francois
    FRONTIERS IN GENETICS, 2019, 10
  • [47] Hydrodynamics of a large-scale mixed-cell raceway (MCR): Experimental studies
    Labatut, Rodrigo A.
    Ebeling, James M.
    Bhaskaran, Rajesh
    Timmons, Michael B.
    AQUACULTURAL ENGINEERING, 2007, 37 (02) : 132 - 143
  • [48] Large-scale pharmacogenomic studies and drug response prediction for personalized cancer medicine
    Feng, Fangyoumin
    Shen, Bihan
    Mou, Xiaoqin
    Li, Yixue
    Li, Hong
    JOURNAL OF GENETICS AND GENOMICS, 2021, 48 (07) : 540 - 551
  • [49] GATEKEEPER's Strategy for the Multinational Large-Scale Piloting of an eHealth Platform: Tutorial on How to Identify Relevant Settings and Use Cases
    de Batlle, Jordi
    Benitez, Ivan D.
    Moncusi-Moix, Anna
    Androutsos, Odysseas
    Barbastro, Rosana Angles
    Antonini, Alessio
    Arana, Eunate
    Cabrera-Umpierrez, Maria Fernanda
    Cea, Gloria
    Dafoulas, George E.
    Folkvord, Frans
    Fullaondo, Ane
    Giuliani, Francesco
    Huang, Hsiao-Ling
    Innominato, Pasquale F.
    Kardas, Przemyslaw
    Lou, Vivian W. Q.
    Manios, Yannis
    Matsangidou, Maria
    Mercalli, Franco
    Mokhtari, Mounir
    Pagliara, Silvio
    Schellong, Julia
    Stieler, Lisa
    Votis, Konstantinos
    Curras, Paula
    Arredondo, Maria Teresa
    Posada, Jorge
    Guillen, Sergio
    Pecchia, Leandro
    Barbe, Ferran
    Torres, Gerard
    Fico, Giuseppe
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2023, 25
  • [50] Effects of Solid Die Types in Complex and Large-Scale Aluminum Profile Extrusion
    Tat-Tai Truong
    Hsu, Quang-Cherng
    Van-Canh Tong
    APPLIED SCIENCES-BASEL, 2020, 10 (01):