Algorithm-driven Artifacts in median polish summarization of Microarray data

被引:43
作者
Giorgi, Federico M. [1 ]
Bolger, Anthony M. [1 ]
Lohse, Marc [1 ]
Usadel, Bjoern [1 ]
机构
[1] Max Planck Inst Mol Plant Physiol, D-14476 Golm, Germany
关键词
DENSITY OLIGONUCLEOTIDE ARRAY; GENECHIP EXPRESSION MEASURES; PROBE LEVEL DATA; NORMALIZATION; COEXPRESSION; NETWORKS; PATTERNS; BIOLOGY; CANCER;
D O I
10.1186/1471-2105-11-553
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: High-throughput measurement of transcript intensities using Affymetrix type oligonucleotide microarrays has produced a massive quantity of data during the last decade. Different preprocessing techniques exist to convert the raw signal intensities measured by these chips into gene expression estimates. Although these techniques have been widely benchmarked in the context of differential gene expression analysis, there are only few examples where their performance has been assessed in respect to coexpression-based studies such as sample classification. Results: In the present paper we benchmark the three most used normalization procedures (MAS5, RMA and GCRMA) in the context of inter-array correlation analysis, confirming and extending the finding that RMA and GCRMA consistently overestimate sample similarity upon normalization. We determine that median polish summarization is responsible for generating a large proportion of these over-similarity artifacts. Furthermore, we show that most affected probesets show also internal signal disagreement, and tend to be composed by individual probes hitting different gene transcripts. We finally provide a correction to the RMA/GCRMA summarization procedure that massively reduces inter-array correlation artifacts, without affecting the detection of differentially expressed genes. Conclusions: We propose tRMA as a modification of RMA to normalize microarray experiments for correlation-based analysis.
引用
收藏
页数:12
相关论文
共 38 条
[1]  
[Anonymous], 2005, BIOINFORMATICS COMPU
[2]   Reverse engineering of regulatory networks in human B cells [J].
Basso, K ;
Margolin, AA ;
Stolovitzky, G ;
Klein, U ;
Dalla-Favera, R ;
Califano, A .
NATURE GENETICS, 2005, 37 (04) :382-390
[3]  
BOLSTAD B, 2008, METHODS MICROARRAY N, V41
[4]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[5]   Unsupervised pattern recognition: An introduction to the whys and wherefores of clustering microarray data [J].
Boutros, PC ;
Okey, AB .
BRIEFINGS IN BIOINFORMATICS, 2005, 6 (04) :331-343
[6]   A benchmark for affymetrix GeneChip expression measures [J].
Cope, LM ;
Irizarry, RA ;
Jaffee, HA ;
Wu, ZJ ;
Speed, TP .
BIOINFORMATICS, 2004, 20 (03) :323-331
[7]   Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data [J].
Dai, MH ;
Wang, PL ;
Boyd, AD ;
Kostov, G ;
Athey, B ;
Jones, EG ;
Bunney, WE ;
Myers, RM ;
Speed, TP ;
Akil, H ;
Watson, SJ ;
Meng, F .
NUCLEIC ACIDS RESEARCH, 2005, 33 (20) :e175.1-e175.9
[8]   Comparisons and validation of statistical clustering techniques for microarray gene expression data [J].
Datta, S ;
Datta, S .
BIOINFORMATICS, 2003, 19 (04) :459-466
[9]   Microarray analysis after RNA amplification can detect pronounced differences in gene expression using limma [J].
Diboun, Ilhem ;
Wernisch, Lorenz ;
Orengo, Christine Anne ;
Koltzenburg, Martin .
BMC GENOMICS, 2006, 7 (1)
[10]   Gene Expression Omnibus: NCBI gene expression and hybridization array data repository [J].
Edgar, R ;
Domrachev, M ;
Lash, AE .
NUCLEIC ACIDS RESEARCH, 2002, 30 (01) :207-210