Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses

被引:229
作者
Nygaard, Vegard [1 ]
Rodland, Einar Andreas [1 ]
Hovig, Eivind [1 ,2 ,3 ]
机构
[1] Oslo Univ Hosp HF, Radiumhosp, Dept Tumor Biol, Inst Canc Res, N-0310 Oslo, Norway
[2] Oslo Univ Hosp HF, Radiumhosp, Inst Canc Genet & Informat, N-0310 Oslo, Norway
[3] Univ Oslo, Dept Informat, N-0316 Oslo, Norway
关键词
Batch effects; Data normalization; Microarrays; Reproducible research; MICROARRAY; IMPACT;
D O I
10.1093/biostatistics/kxv027
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Removal of, or adjustment for, batch effects or center differences is generally required when such effects are present in data. In particular, when preparing microarray gene expression data from multiple cohorts, array platforms, or batches for later analyses, batch effects can have confounding effects, inducing spurious differences between study groups. Many methods and tools exist for removing batch effects from data. However, when study groups are not evenly distributed across batches, actual group differences may induce apparent batch differences, in which case batch adjustments may bias, usually deflate, group differences. Some tools therefore have the option of preserving the difference between study groups, e.g. using a two-way ANOVA model to simultaneously estimate both group and batch effects. Unfortunately, this approach may systematically induce incorrect group differences in downstream analyses when groups are distributed between the batches in an unbalanced manner. The scientific community seems to be largely unaware of how this approach may lead to false discoveries.
引用
收藏
页码:29 / 39
页数:11
相关论文
共 14 条
[1]   Gene expression analysis reveals functional pathways of glatiramer acetate activation [J].
Bakshi, Shlomo ;
Chalifa-Caspi, Vered ;
Plaschkes, Inbar ;
Perevozkin, Igor ;
Gurevich, Michael ;
Schwartz, Riki .
EXPERT OPINION ON THERAPEUTIC TARGETS, 2013, 17 (04) :351-362
[2]   Stratified randomization controls better for batch effects in 450K methylation analysis: a cautionary tale [J].
Buhule, Olive D. ;
Minster, Ryan L. ;
Hawley, Nicola L. ;
Medvedovic, Mario ;
Sun, Guangyun ;
Viali, Satupaitea ;
Deka, Ranjan ;
McGarvey, Stephen T. ;
Weeks, Daniel E. .
FRONTIERS IN GENETICS, 2014, 5
[3]   Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods [J].
Chen, Chao ;
Grennan, Kay ;
Badner, Judith ;
Zhang, Dandan ;
Gershon, Elliot ;
Jin, Li ;
Liu, Chunyu .
PLOS ONE, 2011, 6 (02)
[4]   A Two-Stage Procedure for the Removal of Batch Effects in Microarray Studies [J].
Giordan M. .
Statistics in Biosciences, 2014, 6 (1) :73-84
[5]   Adjusting batch effects in microarray expression data using empirical Bayes methods [J].
Johnson, W. Evan ;
Li, Cheng ;
Rabinovic, Ariel .
BIOSTATISTICS, 2007, 8 (01) :118-127
[6]   Relative impact of key sources of systematic noise in Affymetrix and Illumina gene-expression microarray experiments [J].
Kitchen, Robert R. ;
Sabine, Vicky S. ;
Simen, Arthur A. ;
Dixon, J. Michael ;
Bartlett, John M. S. ;
Sims, Andrew H. .
BMC GENOMICS, 2011, 12
[7]   Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis [J].
Kupfer, Peter ;
Guthke, Reinhard ;
Pohlers, Dirk ;
Huber, Rene ;
Koczan, Dirk ;
Kinne, Raimund W. .
BMC MEDICAL GENOMICS, 2012, 5
[8]   The sva package for removing batch effects and other unwanted variation in high-throughput experiments [J].
Leek, Jeffrey T. ;
Johnson, W. Evan ;
Parker, Hilary S. ;
Jaffe, Andrew E. ;
Storey, John D. .
BIOINFORMATICS, 2012, 28 (06) :882-883
[9]   Tackling the widespread and critical impact of batch effects in high-throughput data [J].
Leek, Jeffrey T. ;
Scharpf, Robert B. ;
Bravo, Hector Corrada ;
Simcha, David ;
Langmead, Benjamin ;
Johnson, W. Evan ;
Geman, Donald ;
Baggerly, Keith ;
Irizarry, Rafael A. .
NATURE REVIEWS GENETICS, 2010, 11 (10) :733-739
[10]   A comparison of batch effect removal methods for enhancement of prediction performance using MAQC-II microarray gene expression data [J].
Luo, J. ;
Schumacher, M. ;
Scherer, A. ;
Sanoudou, D. ;
Megherbi, D. ;
Davison, T. ;
Shi, T. ;
Tong, W. ;
Shi, L. ;
Hong, H. ;
Zhao, C. ;
Elloumi, F. ;
Shi, W. ;
Thomas, R. ;
Lin, S. ;
Tillinghast, G. ;
Liu, G. ;
Zhou, Y. ;
Herman, D. ;
Li, Y. ;
Deng, Y. ;
Fang, H. ;
Bushel, P. ;
Woods, M. ;
Zhang, J. .
PHARMACOGENOMICS JOURNAL, 2010, 10 (04) :278-291