Using control genes to correct for unwanted variation in microarray data

被引:288
作者
Gagnon-Bartsch, Johann A. [1 ]
Speed, Terence P. [1 ,2 ]
机构
[1] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[2] Walter & Eliza Hall Inst Med Res, Bioinformat Div, Melbourne, Vic 3050, Australia
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Batch effect; Control gene; Differential expression; Factor analysis; SVA; Unwanted variation; QUALITY ASSESSMENT; EXPRESSION; NORMALIZATION; SUMMARIES; VARIANCE; MODEL;
D O I
10.1093/biostatistics/kxr034
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Microarray expression studies suffer from the problem of batch effects and other unwanted variation. Many methods have been proposed to adjust microarray data to mitigate the problems of unwanted variation. Several of these methods rely on factor analysis to infer the unwanted variation from the data. A central problem with this approach is the difficulty in discerning the unwanted variation from the biological variation that is of interest to the researcher. We present a new method, intended for use in differential expression studies, that attempts to overcome this problem by restricting the factor analysis to negative control genes. Negative control genes are genes known a priori not to be differentially expressed with respect to the biological factor of interest. Variation in the expression levels of these genes can therefore be assumed to be unwanted variation. We name this method "Remove Unwanted Variation, 2-step" (RUV-2). We discuss various techniques for assessing the performance of an adjustment method and compare the performance of RUV-2 with that of other commonly used adjustment methods such as Combat and Surrogate Variable Analysis (SVA). We present several example studies, each concerning genes differentially expressed with respect to gender in the brain and find that RUV-2 performs as well or better than other methods. Finally, we discuss the possibility of adapting RUV-2 for use in studies not concerned with differential expression and conclude that there may be promise but substantial challenges remain.
引用
收藏
页码:539 / 552
页数:14
相关论文
共 28 条
  • [1] Singular value decomposition for genome-wide expression data processing and modeling
    Alter, O
    Brown, PO
    Botstein, D
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) : 10101 - 10106
  • [2] [Anonymous], 2006, Pattern recognition and machine learning
  • [3] A comparison of normalization methods for high density oligonucleotide array data based on variance and bias
    Bolstad, BM
    Irizarry, RA
    Åstrand, M
    Speed, TP
    [J]. BIOINFORMATICS, 2003, 19 (02) : 185 - 193
  • [4] Brettschneider J, 2008, TECHNOMETRICS, V50, P241, DOI 10.1198/004017008000000334
  • [5] Human housekeeping genes are compact
    Eisenberg, E
    Levanon, EY
    [J]. TRENDS IN GENETICS, 2003, 19 (07) : 362 - 365
  • [6] Quality assessment of affymetrix GeneChip data
    Heber, Steffen
    Sick, Beate
    [J]. OMICS-A JOURNAL OF INTEGRATIVE BIOLOGY, 2006, 10 (03) : 358 - 368
  • [7] ROBPCA: A new approach to robust principal component analysis
    Hubert, M
    Rousseeuw, PJ
    Vanden Branden, K
    [J]. TECHNOMETRICS, 2005, 47 (01) : 64 - 79
  • [8] Exploration, normalization, and summaries of high density oligonucleotide array probe level data
    Irizarry, RA
    Hobbs, B
    Collin, F
    Beazer-Barclay, YD
    Antonellis, KJ
    Scherf, U
    Speed, TP
    [J]. BIOSTATISTICS, 2003, 4 (02) : 249 - 264
  • [9] Summaries of affymetrix GeneChip probe level data
    Irizarry, RA
    Bolstad, BM
    Collin, F
    Cope, LM
    Hobbs, B
    Speed, TP
    [J]. NUCLEIC ACIDS RESEARCH, 2003, 31 (04) : e15
  • [10] Adjusting batch effects in microarray expression data using empirical Bayes methods
    Johnson, W. Evan
    Li, Cheng
    Rabinovic, Ariel
    [J]. BIOSTATISTICS, 2007, 8 (01) : 118 - 127