Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial

被引:62
|
作者
Cuklina, Jelena [1 ,2 ,3 ,4 ]
Lee, Chloe H. [1 ]
Williams, Evan G. [1 ,5 ]
Sajic, Tatjana [1 ]
Collins, Ben C. [1 ,6 ]
Martinez, Maria Rodriguez [4 ]
Sharma, Varun S. [1 ]
Wendt, Fabian [7 ]
Goetze, Sandra [7 ,8 ,9 ]
Keele, Gregory R. [10 ]
Wollscheid, Bernd [7 ,8 ,9 ]
Aebersold, Ruedi [1 ,11 ]
Pedrioli, Patrick G. A. [1 ,7 ,8 ,9 ]
机构
[1] Swiss Fed Inst Technol, Dept Biol, Inst Mol Syst Biol, Zurich, Switzerland
[2] Univ Zurich, PhD Program Syst Biol, Zurich, Switzerland
[3] Swiss Fed Inst Technol, Zurich, Switzerland
[4] IBM Res Europe, Ruschlikon, Switzerland
[5] Univ Luxembourg, Luxembourg Ctr Syst Biomed, Luxembourg, Luxembourg
[6] Queens Univ Belfast, Belfast, Antrim, North Ireland
[7] Swiss Fed Inst Technol, Dept Hlth Sci & Technol, Inst Translat Med, Zurich, Switzerland
[8] Swiss Fed Inst Technol, PHRT CPAC, Zurich, Switzerland
[9] SIB Swiss Inst Bioinformat, Lausanne, Switzerland
[10] Jackson Lab, 600 Main St, Bar Harbor, ME 04609 USA
[11] Univ Zurich, Fac Sci, Zurich, Switzerland
基金
欧洲研究理事会; 瑞士国家科学基金会;
关键词
batch effects; data analysis; large-scale proteomics; normalization; quantitative proteomics; NORMALIZATION METHODS; MASS-SPECTROMETRY; GENE-EXPRESSION; PROTEOGENOMIC CHARACTERIZATION; STATISTICAL-ANALYSIS; OMICS DATA; R-PACKAGE; PLATFORM; DESIGN;
D O I
10.15252/msb.202110240
中图分类号
Q5 [生物化学]; Q7 [分子生物学];
学科分类号
071010 ; 081704 ;
摘要
Advancements in mass spectrometry-based proteomics have enabled experiments encompassing hundreds of samples. While these large sample sets deliver much-needed statistical power, handling them introduces technical variability known as batch effects. Here, we present a step-by-step protocol for the assessment, normalization, and batch correction of proteomic data. We review established methodologies from related fields and describe solutions specific to proteomic challenges, such as ion intensity drift and missing values in quantitative feature matrices. Finally, we compile a set of techniques that enable control of batch effect adjustment quality. We provide an R package, "proBatch", containing functions required for each step of the protocol. We demonstrate the utility of this methodology on five proteomic datasets each encompassing hundreds of samples and consisting of multiple experimental designs. In conclusion, we provide guidelines and tools to make the extraction of true biological signal from large proteomic studies more robust and transparent, ultimately facilitating reliable and reproducible research in clinical proteomics and systems biology.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] Assessing and mitigating batch effects in large-scale omics studies
    Yu, Ying
    Mai, Yuanbang
    Zheng, Yuanting
    Shi, Leming
    GENOME BIOLOGY, 2024, 25 (01):
  • [2] An optimized guanidination method for large-scale proteomic studies
    Ye, Juanying
    Zhang, Yang
    Huang, Lin
    Li, Qingqing
    Huang, Jingnan
    Lu, Jianan
    Li, Yanhong
    Zhang, Xumin
    PROTEOMICS, 2016, 16 (13) : 1837 - 1846
  • [3] Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method
    Yu, Ying
    Zhang, Naixin
    Mai, Yuanbang
    Ren, Luyao
    Chen, Qiaochu
    Cao, Zehui
    Chen, Qingwang
    Liu, Yaqing
    Hou, Wanwan
    Yang, Jingcheng
    Hong, Huixiao
    Xu, Joshua
    Tong, Weida
    Dong, Lianhua
    Shi, Leming
    Fang, Xiang
    Zheng, Yuanting
    GENOME BIOLOGY, 2023, 24 (01)
  • [4] An anchored experimental design and meta-analysis approach to address batch effects in large-scale metabolomics
    Shaver, Amanda O.
    Garcia, Brianna M.
    Gouveia, Goncalo J.
    Morse, Alison M.
    Liu, Zihao
    Asef, Carter K.
    Borges, Ricardo M.
    Leach, Franklin E., III
    Andersen, Erik C.
    Amster, I. Jonathan
    Fernandez, Facundo M.
    Edison, Arthur S.
    McIntyre, Lauren M.
    FRONTIERS IN MOLECULAR BIOSCIENCES, 2022, 9
  • [5] STEM: A software tool for large-scale proteomic data analyses
    Shinkawa, T
    Taoka, M
    Yamauchi, Y
    Ichimura, T
    Kaji, H
    Takahashi, N
    Isobe, T
    JOURNAL OF PROTEOME RESEARCH, 2005, 4 (05) : 1826 - 1831
  • [6] Tag-Count Analysis of Large-Scale Proteomic Data
    Branson, Owen E.
    Freitas, Michael A.
    JOURNAL OF PROTEOME RESEARCH, 2016, 15 (12) : 4742 - 4746
  • [7] Large-scale proteomic analysis of membrane proteins
    Ahram, M
    Springer, DL
    EXPERT REVIEW OF PROTEOMICS, 2004, 1 (03) : 293 - 302
  • [8] WaveICA: A novel algorithm to remove batch effects for large-scale untargeted metabolomics data based on wavelet analysis
    Deng, Kui
    Zhang, Fan
    Tan, Qilong
    Huang, Yue
    Song, Wei
    Rong, Zhiwei
    Zhu, Zheng-Jiang
    Li, Zhenzi
    Li, Kang
    ANALYTICA CHIMICA ACTA, 2019, 1061 : 60 - 69
  • [9] Peptide correlation: A means to identify high quality quantitative information in large-scale proteomic studies
    Schwarz, Emanuel
    Levin, Yishai
    Wang, Lan
    Leweke, F. Markus
    Bahn, Sabine
    JOURNAL OF SEPARATION SCIENCE, 2007, 30 (14) : 2190 - 2197
  • [10] Longitudinal Large-Scale Semiquantitative Proteomic Data Stability Across Multiple Instrument Platforms
    Lu, Congcong
    Glisovic-Aplenc, Tina
    Bernt, Kathrin M.
    Nestler, Kevin
    Cesare, Joseph
    Cao, Lusha
    Lee, Hyoungjoo
    Fazelinia, Hossein
    Chinwalla, Asif
    Xu, Yang
    Shestova, Olga
    Xing, Yi
    Gill, Saar
    Li, Mingyao
    Garcia, Benjamin
    Aplenc, Richard
    JOURNAL OF PROTEOME RESEARCH, 2021, 20 (11) : 5203 - 5211