Why Batch Effects Matter in Omics Data, and How to Avoid Them

被引:255
作者
Goh, Wilson Wen Bin [1 ,2 ]
Wang, Wei [1 ]
Wong, Limsoon [2 ,3 ]
机构
[1] Tianjin Univ, Sch Pharmaceut Sci & Technol, Tianjin 300072, Peoples R China
[2] Natl Univ Singapore, Dept Comp Sci, Singapore 117417, Singapore
[3] Natl Univ Singapore, Dept Pathol, Singapore 119074, Singapore
关键词
SURROGATE VARIABLE ANALYSIS; GENE-EXPRESSION; UNWANTED VARIATION; MICROARRAY DATA; DISCOVERY; HETEROGENEITY; RANDOMIZATION; IMPROVES;
D O I
10.1016/j.tibtech.2017.02.012
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Effective integration and analysis of new high-throughput data, especially gene-expression and proteomic-profiling data, are expected to deliver novel clinical insights and therapeutic options. Unfortunately, technical heterogeneity or batch effects (different experiment times, handlers, reagent lots, etc.) have proven challenging. Although batch effect-correction algorithms (BECAs) exist, we know little about effective batch-effect mitigation: even now, new batch effect-associated problems are emerging. These include false effects due to misapplying BECAs and positive bias during model evaluations. Depending on the choice of algorithm and experimental set-up, biological heterogeneity can be mistaken for batch effects and wrongfully removed. Here, we examine these emerging batch effect-associated problems, propose a series of best practices, and discuss some of the challenges that lie ahead.
引用
收藏
页码:498 / 507
页数:10
相关论文
共 20 条
[11]   Batch correction of microarray data substantially improves the identification of genes differentially expressed in Rheumatoid Arthritis and Osteoarthritis [J].
Kupfer, Peter ;
Guthke, Reinhard ;
Pohlers, Dirk ;
Huber, Rene ;
Koczan, Dirk ;
Kinne, Raimund W. .
BMC MEDICAL GENOMICS, 2012, 5
[12]   Capturing heterogeneity in gene expression studies by surrogate variable analysis [J].
Leek, Jeffrey T. ;
Storey, John D. .
PLOS GENETICS, 2007, 3 (09) :1724-1735
[13]   Tackling the widespread and critical impact of batch effects in high-throughput data [J].
Leek, Jeffrey T. ;
Scharpf, Robert B. ;
Bravo, Hector Corrada ;
Simcha, David ;
Langmead, Benjamin ;
Johnson, W. Evan ;
Geman, Donald ;
Baggerly, Keith ;
Irizarry, Rafael A. .
NATURE REVIEWS GENETICS, 2010, 11 (10) :733-739
[14]   Removing Batch Effects from Longitudinal Gene Expression - Quantile Normalization Plus ComBat as Best Approach for Microarray Transcriptome Data [J].
Mueller, Christian ;
Schillert, Arne ;
Roethemeier, Caroline ;
Tregouet, David-Alexandre ;
Proust, Carole ;
Binder, Harald ;
Pfeiffer, Norbert ;
Beutel, Manfred ;
Lackner, Karl J. ;
Schnabel, Renate B. ;
Tiret, Laurence ;
Wild, Philipp S. ;
Blankenberg, Stefan ;
Zeller, Tanja ;
Ziegler, Andreas .
PLOS ONE, 2016, 11 (06)
[15]   Methods that remove batch effects while retaining group differences may lead to exaggerated confidence in downstream analyses [J].
Nygaard, Vegard ;
Rodland, Einar Andreas ;
Hovig, Eivind .
BIOSTATISTICS, 2016, 17 (01) :29-39
[16]   Preserving biological heterogeneity with a permuted surrogate variable analysis for genomics batch correction [J].
Parker, Hilary S. ;
Leek, Jeffrey T. ;
Favorov, Alexander V. ;
Considine, Michael ;
Xia, Xiaoxin ;
Chavan, Sameer ;
Chung, Christine H. ;
Fertig, Elana J. .
BIOINFORMATICS, 2014, 30 (19) :2757-2763
[17]   Cautionary Note on Using Cross-Validation for Molecular Classification [J].
Qin, Li-Xuan ;
Huang, Huei-Chung ;
Begg, Colin B. .
JOURNAL OF CLINICAL ONCOLOGY, 2016, 34 (32) :3931-+
[18]   Blocking and Randomization to Improve Molecular Biomarker Discovery [J].
Qin, Li-Xuan ;
Zhou, Qin ;
Bogomolniy, Faina ;
Villafania, Liliana ;
Olvera, Narciso ;
Cavatore, Magali ;
Satagopan, Jaya M. ;
Begg, Colin B. ;
Levine, Douglas A. .
CLINICAL CANCER RESEARCH, 2014, 20 (13) :3371-3378
[19]   Batch Effect Confounding Leads to Strong Bias in Performance Estimates Obtained by Cross-Validation [J].
Soneson, Charlotte ;
Gerster, Sarah ;
Delorenzi, Mauro .
PLOS ONE, 2014, 9 (06)
[20]   Most Random Gene Expression Signatures Are Significantly Associated with Breast Cancer Outcome [J].
Venet, David ;
Dumont, Jacques E. ;
Detours, Vincent .
PLOS COMPUTATIONAL BIOLOGY, 2011, 7 (10)