Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial

被引:62
作者
Cuklina, Jelena [1 ,2 ,3 ,4 ]
Lee, Chloe H. [1 ]
Williams, Evan G. [1 ,5 ]
Sajic, Tatjana [1 ]
Collins, Ben C. [1 ,6 ]
Martinez, Maria Rodriguez [4 ]
Sharma, Varun S. [1 ]
Wendt, Fabian [7 ]
Goetze, Sandra [7 ,8 ,9 ]
Keele, Gregory R. [10 ]
Wollscheid, Bernd [7 ,8 ,9 ]
Aebersold, Ruedi [1 ,11 ]
Pedrioli, Patrick G. A. [1 ,7 ,8 ,9 ]
机构
[1] Swiss Fed Inst Technol, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland
[2] Univ Zurich, PhD Program Syst Biol, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] IBM Res Europe, Ruschlikon, Switzerland
[5] Univ Luxembourg, Luxembourg Ctr Syst Biomed, Luxembourg, Luxembourg
[6] Queens Univ Belfast, Belfast, Antrim, North Ireland
[7] Swiss Fed Inst Technol, Dept Hlth Sci & Technol, Inst Translat Med, Zurich, Switzerland
[8] Swiss Fed Inst Technol, PHRT CPAC, Zurich, Switzerland
[9] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[10] Jackson Lab, 600 Main St, Bar Harbor, ME 04609 USA
[11] Univ Zurich, Fac Sci, Zurich, Switzerland
基金
瑞士国家科学基金会; 欧洲研究理事会;
关键词
batch effects; data analysis; large-scale proteomics; normalization; quantitative proteomics; NORMALIZATION METHODS; MASS-SPECTROMETRY; GENE-EXPRESSION; PROTEOGENOMIC CHARACTERIZATION; STATISTICAL-ANALYSIS; OMICS DATA; R-PACKAGE; PLATFORM; DESIGN;
D O I
10.15252/msb.202110240
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
引用
收藏
页数:16
相关论文
共 50 条
  • [31] A large-scale experimental comparison of batch and continuous technologies in pharmaceutical tablet manufacturing using ethenzamide
    Matsunami, Kensaku
    Nagato, Takuya
    Hasegawa, Koji
    Sugiyama, Hirokazu
    INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2019, 559 : 210 - 219
  • [32] A large-scale study on the effects of sex on gray matter asymmetry
    Nunez, Christian
    Theofanopoulou, Constantina
    Senior, Carl
    Rosa Cambra, Maria
    Usall, Judith
    Stephan-Otto, Christian
    Brebion, Gildas
    BRAIN STRUCTURE & FUNCTION, 2018, 223 (01) : 183 - 193
  • [33] The BAMBOO method for correcting batch effects in high throughput proximity extension assays for proteomic studies
    Smits, H. M.
    Delemarre, E. M.
    Pandit, A.
    Schoneveld, A. H.
    Oldenburg, B.
    van Wijk, F.
    Nierkens, S.
    Drylewicz, J.
    SCIENTIFIC REPORTS, 2025, 15 (01):
  • [34] Large-Scale Transcriptomics Studies Provide Insight Into Sex Differences in Depression
    Seney, Marianne L.
    Glausier, Jill
    Sibille, Etienne
    BIOLOGICAL PSYCHIATRY, 2022, 91 (01) : 14 - 24
  • [35] Integrative approaches for large-scale transcriptome-wide association studies
    Gusev, Alexander
    Ko, Arthur
    Shi, Huwenbo
    Bhatia, Gaurav
    Chung, Wonil
    Penninx, Brenda W. J. H.
    Jansen, Rick
    de Geus, Eco J. C.
    Boomsma, Dorret I.
    Wright, Fred A.
    Sullivan, Patrick F.
    Nikkola, Elina
    Alvarez, Marcus
    Civelek, Mete
    Lusis, Aldons J.
    Lehtimaki, Terho
    Raitoharju, Emma
    Kahonen, Mika
    Seppala, Ilkka
    Raitakari, Olli T.
    Kuusisto, Johanna
    Laakso, Markku
    Price, Alkes L.
    Pajukanta, Paivi
    Pasaniuc, Bogdan
    NATURE GENETICS, 2016, 48 (03) : 245 - 252
  • [36] Genome-wide association studies and large-scale collaborations in epidemiology
    Psaty, Bruce M.
    Hofman, Albert
    EUROPEAN JOURNAL OF EPIDEMIOLOGY, 2010, 25 (08) : 525 - 529
  • [37] Transcriptomic and proteomic responses of Serratia marcescens to spaceflight conditions involve large-scale changes in metabolic pathways
    Wang, Yajuan
    Yuan, Yanting
    Liu, Jinwen
    Su, Longxiang
    Chang, De
    Guo, Yinghua
    Chen, Zhenhong
    Fang, Xiangqun
    Wang, Junfeng
    Li, Tianzhi
    Zhou, Lisha
    Fang, Chengxiang
    Yang, Ruifu
    Liu, Changting
    ADVANCES IN SPACE RESEARCH, 2014, 53 (07) : 1108 - 1117
  • [38] Trans-Proteomic Pipeline, a standardized data processing pipeline for large-scale reproducible proteomics informatics
    Deutsch, Eric W.
    Mendoza, Luis
    Shteynberg, David
    Slagel, Joseph
    Sun, Zhi
    Moritz, Robert L.
    PROTEOMICS CLINICAL APPLICATIONS, 2015, 9 (7-8) : 745 - 754
  • [39] A Large-Scale Quantitative Proteomic Approach To Identifying Sulfur Mustard-Induced Protein Phosphorylation Cascades
    Everley, Patrick A.
    Dillman, James F., III
    CHEMICAL RESEARCH IN TOXICOLOGY, 2010, 23 (01) : 20 - 25
  • [40] Detection and Correction of Silent Data Corruption for Large-Scale High-Performance Computing
    Fiala, David
    Mueller, Frank
    Engelmann, Christian
    Riesen, Rolf
    Ferreira, Kurt
    Brightwell, Ron
    2012 INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS (SC), 2012,