Correcting batch effects in large-scale multiomics studies using a reference-material-based ratio method

被引:29
作者
Yu, Ying [1 ]
Zhang, Naixin [1 ]
Mai, Yuanbang [1 ]
Ren, Luyao [1 ]
Chen, Qiaochu [1 ]
Cao, Zehui [1 ]
Chen, Qingwang [1 ]
Liu, Yaqing [1 ]
Hou, Wanwan [1 ]
Yang, Jingcheng [1 ,2 ]
Hong, Huixiao [3 ]
Xu, Joshua [3 ]
Tong, Weida [3 ]
Dong, Lianhua [4 ]
Shi, Leming [1 ,5 ]
Fang, Xiang [4 ]
Zheng, Yuanting [1 ]
机构
[1] Fudan Univ, Shanghai Canc Ctr, State Key Lab Genet Engn, Sch Life Sci & Human Phenome Inst, Shanghai, Peoples R China
[2] Greater Bay Area Inst Precis Med, Guangzhou, Guangdong, Peoples R China
[3] US FDA, Natl Ctr Toxicol Res, Div Bioinformat & Biostat, Jefferson, AR USA
[4] Natl Inst Metrol, Beijing, Peoples R China
[5] Int Human Phenome Inst, Shanghai, Peoples R China
基金
中国国家自然科学基金;
关键词
Batch effect; Ratio; Reference materials; Multiomics; Phenomics; Differentially expressed; Prediction; Data integration; Quartet family; Metrology; GENE-EXPRESSION; REPRODUCIBILITY; DISCOVERY; PLATFORM; CANCER;
D O I
10.1186/s13059-023-03047-z
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
BackgroundBatch effects are notoriously common technical variations in multiomics data and may result in misleading outcomes if uncorrected or over-corrected. A plethora of batch-effect correction algorithms are proposed to facilitate data integration. However, their respective advantages and limitations are not adequately assessed in terms of omics types, the performance metrics, and the application scenarios.ResultsAs part of the Quartet Project for quality control and data integration of multiomics profiling, we comprehensively assess the performance of seven batch effect correction algorithms based on different performance metrics of clinical relevance, i.e., the accuracy of identifying differentially expressed features, the robustness of predictive models, and the ability of accurately clustering cross-batch samples into their own donors. The ratio-based method, i.e., by scaling absolute feature values of study samples relative to those of concurrently profiled reference material(s), is found to be much more effective and broadly applicable than others, especially when batch effects are completely confounded with biological factors of study interests. We further provide practical guidelines for implementing the ratio based approach in increasingly large-scale multiomics studies.ConclusionsMultiomics measurements are prone to batch effects, which can be effectively corrected using ratio-based scaling of the multiomics data. Our study lays the foundation for eliminating batch effects at a ratio scale.
引用
收藏
页数:26
相关论文
共 62 条
[1]   Comparative analysis of RNA sequencing methods for degraded or low-input samples [J].
Adiconis, Xian ;
Borges-Rivera, Diego ;
Satija, Rahul ;
DeLuca, David S. ;
Busby, Michele A. ;
Berlin, Aaron M. ;
Sivachenko, Andrey ;
Thompson, Dawn Anne ;
Wysoker, Alec ;
Fennell, Timothy ;
Gnirke, Andreas ;
Pochet, Nathalie ;
Regev, Aviv ;
Levin, Joshua Z. .
NATURE METHODS, 2013, 10 (07) :623-+
[2]  
Akbani R., 2022, TCGA Batch Effects Viewer
[3]   Advanced bioinformatics methods for practical applications in proteomics [J].
Bin Goh, Wilson Wen ;
Wong, Limsoon .
BRIEFINGS IN BIOINFORMATICS, 2019, 20 (01) :347-355
[4]   70-Gene Signature as an Aid to Treatment Decisions in Early-Stage Breast Cancer [J].
Cardoso, F. ;
van't Veer, L. J. ;
Bogaerts, J. ;
Slaets, L. ;
Viale, G. ;
Delaloge, S. ;
Pierga, J. -Y. ;
Brain, E. ;
Causeret, S. ;
DeLorenzi, M. ;
Glas, A. M. ;
Golfinopoulos, V. ;
Goulioti, T. ;
Knox, S. ;
Matos, E. ;
Meulemans, B. ;
Neijenhuis, P. A. ;
Nitz, U. ;
Passalacqua, R. ;
Ravdin, P. ;
Rubio, I. T. ;
Saghatchian, M. ;
Smilde, T. J. ;
Sotiriou, C. ;
Stork, L. ;
Straehle, C. ;
Thomas, G. ;
Thompson, A. M. ;
van der Hoeven, J. M. ;
Vuylsteke, P. ;
Bernards, R. ;
Tryfonidis, K. ;
Rutgers, E. ;
Piccart, M. .
NEW ENGLAND JOURNAL OF MEDICINE, 2016, 375 (08) :717-729
[5]   Integrative clustering of multi-level 'omic data based on non-negative matrix factorization algorithm [J].
Chalise, Prabhakar ;
Fridley, Brooke L. .
PLOS ONE, 2017, 12 (05)
[6]   A multicenter study benchmarking single-cell RNA sequencing technologies using reference samples [J].
Chen, Wanqiu ;
Zhao, Yongmei ;
Chen, Xin ;
Yang, Zhaowei ;
Xu, Xiaojiang ;
Bi, Yingtao ;
Chen, Vicky ;
Li, Jing ;
Choi, Hannah ;
Ernest, Ben ;
Tran, Bao ;
Mehta, Monika ;
Kumar, Parimal ;
Farmer, Andrew ;
Mir, Alain ;
Mehra, Urvashi Ann ;
Li, Jian-Liang ;
Moos, Malcolm, Jr. ;
Xiao, Wenming ;
Wang, Charles .
NATURE BIOTECHNOLOGY, 2021, 39 (09) :1103-+
[7]   Diagnostics and correction of batch effects in large-scale proteomic studies: a tutorial [J].
Cuklina, Jelena ;
Lee, Chloe H. ;
Williams, Evan G. ;
Sajic, Tatjana ;
Collins, Ben C. ;
Martinez, Maria Rodriguez ;
Sharma, Varun S. ;
Wendt, Fabian ;
Goetze, Sandra ;
Keele, Gregory R. ;
Wollscheid, Bernd ;
Aebersold, Ruedi ;
Pedrioli, Patrick G. A. .
MOLECULAR SYSTEMS BIOLOGY, 2021, 17 (08)
[8]   Statistical Methods for Handling Unwanted Variation in Metabolomics Data [J].
De Livera, Alysha M. ;
Sysi-Aho, Marko ;
Jacob, Laurent ;
Gagnon-Bartsch, Johann A. ;
Castillo, Sandra ;
Simpson, Julie A. ;
Speed, Terence P. .
ANALYTICAL CHEMISTRY, 2015, 87 (07) :3606-3615
[9]   Statistical detection of quantitative protein biomarkers provides insights into signaling networks deregulated in acute myeloid leukemia [J].
Elo, Laura L. ;
Karjalainen, Riikka ;
Ohman, Tiina ;
Hintsanen, Petteri ;
Nyman, Tuula A. ;
Heckman, Caroline A. ;
Aittokallio, Tero .
PROTEOMICS, 2014, 14 (21-22) :2443-2453
[10]   Firmiana: towards a one-stop proteomic cloud platform for data processing and analysis [J].
Feng, Jinwen ;
Ding, Chen ;
Qiu, Naiqi ;
Ni, Xiaotian ;
Zhan, Dongdong ;
Liu, Wanlin ;
Xia, Xia ;
Li, Peng ;
Lu, Bingxin ;
Zhao, Qi ;
Nie, Peng ;
Song, Lei ;
Zhou, Quan ;
Lai, Mi ;
Guo, Gaigai ;
Zhu, Weimin ;
Ren, Jian ;
Shi, Tieliu ;
Qin, Jun .
NATURE BIOTECHNOLOGY, 2017, 35 (05) :409-412