Correcting gene expression data when neither the unwanted variation nor the factor of interest are observed

被引:64
作者
Jacob, Laurent [1 ]
Gagnon-Bartsch, Johann A. [2 ]
Speed, Terence P. [2 ,3 ]
机构
[1] Univ Lyon 1, UMR, CNRS, Lab Biometrie & Biol Evolut, F-5558 Lyon, France
[2] Univ Calif Berkeley, Dept Stat, Berkeley, CA USA
[3] Walter & Eliza Hall Inst Med Res, Div Bioinformat, Melbourne, Vic 3052, Australia
基金
英国医学研究理事会;
关键词
Batch effect; Control genes; Gene expression; Normalization; Replicate samples; NORMALIZATION;
D O I
10.1093/biostatistics/kxv026
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
When dealing with large scale gene expression studies, observations are commonly contaminated by sources of unwanted variation such as platforms or batches. Not taking this unwanted variation into account when analyzing the data can lead to spurious associations and to missing important signals. When the analysis is unsupervised, e.g. when the goal is to cluster the samples or to build a corrected version of the dataset-as opposed to the study of an observed factor of interest-taking unwanted variation into account can become a difficult task. The factors driving unwanted variation may be correlated with the unobserved factor of interest, so that correcting for the former can remove the latter if not done carefully. We show how negative control genes and replicate samples can be used to estimate unwanted variation in gene expression, and discuss how this information can be used to correct the expression data. The proposed methods are then evaluated on synthetic data and three gene expression datasets. They generally manage to remove unwanted variation without losing the signal of interest and compare favorably to state-of-the-art corrections. All proposed methods are implemented in the bioconductor package RUVnormalize.
引用
收藏
页码:16 / 28
页数:13
相关论文
共 19 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
[Anonymous], 2005, STAT MODELS THEORY P, DOI DOI 10.1017/CBO9781139165495
[3]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[4]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[5]   Comprehensive genomic characterization defines human glioblastoma genes and core pathways [J].
Chin, L. ;
Meyerson, M. ;
Aldape, K. ;
Bigner, D. ;
Mikkelsen, T. ;
VandenBerg, S. ;
Kahn, A. ;
Penny, R. ;
Ferguson, M. L. ;
Gerhard, D. S. ;
Getz, G. ;
Brennan, C. ;
Taylor, B. S. ;
Winckler, W. ;
Park, P. ;
Ladanyi, M. ;
Hoadley, K. A. ;
Verhaak, R. G. W. ;
Hayes, D. N. ;
Spellman, Paul T. ;
Absher, D. ;
Weir, B. A. ;
Ding, L. ;
Wheeler, D. ;
Lawrence, M. S. ;
Cibulskis, K. ;
Mardis, E. ;
Zhang, Jinghui ;
Wilson, R. K. ;
Donehower, L. ;
Wheeler, D. A. ;
Purdom, E. ;
Wallis, J. ;
Laird, P. W. ;
Herman, J. G. ;
Schuebel, K. E. ;
Weisenberger, D. J. ;
Baylin, S. B. ;
Schultz, N. ;
Yao, Jun ;
Wiedemeyer, R. ;
Weinstein, J. ;
Sander, C. ;
Gibbs, R. A. ;
Gray, J. ;
Kucherlapati, R. ;
Lander, E. S. ;
Myers, R. M. ;
Perou, C. M. ;
McLendon, Roger .
NATURE, 2008, 455 (7216) :1061-1068
[6]   Statistical Methods for Handling Unwanted Variation in Metabolomics Data [J].
De Livera, Alysha M. ;
Sysi-Aho, Marko ;
Jacob, Laurent ;
Gagnon-Bartsch, Johann A. ;
Castillo, Sandra ;
Simpson, Julie A. ;
Speed, Terence P. .
ANALYTICAL CHEMISTRY, 2015, 87 (07) :3606-3615
[7]  
GAGNON-BARTSCH J., 2013, UC BERKELEY MO UNPUB
[8]   Using control genes to correct for unwanted variation in microarray data [J].
Gagnon-Bartsch, Johann A. ;
Speed, Terence P. .
BIOSTATISTICS, 2012, 13 (03) :539-552
[9]   Relations between two sets of variates [J].
Hotelling, H .
BIOMETRIKA, 1936, 28 :321-377
[10]  
JACOB L., 2015, BLUEPRINT MANA UNPUB