Covariance adjustment for batch effect in gene expression data

被引:10
作者
Lee, Jung Ae [1 ]
Dobbin, Kevin K. [2 ]
Ahn, Jeongyoun [3 ]
机构
[1] Washington Univ, Div Publ Hlth Sci, St Louis, MO 63110 USA
[2] Univ Georgia, Dept Epidemiol & Biostat, Athens, GA 30605 USA
[3] Univ Georgia, Dept Stat, Athens, GA 30602 USA
基金
美国国家卫生研究院;
关键词
batch effect; factor model; gene expression; high-dimensional covariance estimation; SURVIVAL PREDICTION; REGULARIZATION;
D O I
10.1002/sim.6157
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Batch bias has been found in many microarray gene expression studies that involve multiple batches of samples. A serious batch effect can alter not only the distribution of individual genes but also the inter-gene relationships. Even though some efforts have been made to remove such bias, there has been relatively less development on a multivariate approach, mainly because of the analytical difficulty due to the high-dimensional nature of gene expression data. We propose a multivariate batch adjustment method that effectively eliminates inter-gene batch effects. The proposed method utilizes high-dimensional sparse covariance estimation based on a factor model and a hard thresholding. Another important aspect of the proposed method is that if it is known that one of the batches is produced in a superior condition, the other batches can be adjusted so that they resemble the target batch. We study high-dimensional asymptotic properties of the proposed estimator and compare the performance of the proposed method with some popular existing methods with simulated data and gene expression data sets. Copyright (c) 2014 John Wiley & Sons, Ltd.
引用
收藏
页码:2681 / 2695
页数:15
相关论文
共 31 条
[1]   Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments [J].
Baggerly, KA ;
Morris, JS ;
Coombes, KR .
BIOINFORMATICS, 2004, 20 (05) :777-U710
[2]   Adjustment of systematic microarray data biases [J].
Benito, M ;
Parker, J ;
Du, Q ;
Wu, JY ;
Xang, D ;
Perou, CM ;
Marron, JS .
BIOINFORMATICS, 2004, 20 (01) :105-114
[3]   COVARIANCE REGULARIZATION BY THRESHOLDING [J].
Bickel, Peter J. ;
Levina, Elizaveta .
ANNALS OF STATISTICS, 2008, 36 (06) :2577-2604
[4]   Adaptive Thresholding for Sparse Covariance Matrix Estimation [J].
Cai, Tony ;
Liu, Weidong .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :672-684
[5]   High-Dimensional Sparse Factor Modeling: Applications in Gene Expression Genomics [J].
Carvalho, Carlos M. ;
Chang, Jeffrey ;
Lucas, Joseph E. ;
Nevins, Joseph R. ;
Wang, Quanli ;
West, Mike .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2008, 103 (484) :1438-1456
[6]   Ratio adjustment and calibration scheme for gene-wise normalization to enhance microarray inter-study prediction [J].
Cheng, Chunrong ;
Shen, Kui ;
Song, Chi ;
Luo, Jianhua ;
Tseng, George C. .
BIOINFORMATICS, 2009, 25 (13) :1655-1661
[7]   Comparative Analysis of Pyrosequencing and a Phylogenetic Microarray for Exploring Microbial Community Structures in the Human Distal Intestine [J].
Claesson, Marcus J. ;
O'Sullivan, Orla ;
Wang, Qiong ;
Nikkilae, Janne ;
Marchesi, Julian R. ;
Smidt, Hauke ;
de Vos, Willem M. ;
Ross, R. Paul ;
O'Toole, Paul W. .
PLOS ONE, 2009, 4 (08)
[8]  
Dobbin KK, 2005, CLIN CANCER RES, V11, P565
[9]   ON TESTING THE SIGNIFICANCE OF SETS OF GENES [J].
Efron, Bradley ;
Tibshirani, Robert .
ANNALS OF APPLIED STATISTICS, 2007, 1 (01) :107-129
[10]   Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer [J].
Ein-Dor, L ;
Zuk, O ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (15) :5923-5928