Global rank-invariant set normalization (GRSN) to reduce systematic distortions in microarray data

被引:44
作者
Pelz, Carl R. [1 ]
Kulesz-Martin, Molly [2 ,3 ,5 ]
Bagby, Grover [1 ,4 ,5 ]
Sears, Rosalie C. [1 ,5 ]
机构
[1] Oregon Hlth & Sci Univ, Dept Mol & Med Genet, Portland, OR 97239 USA
[2] Oregon Hlth & Sci Univ, Dept Dermatol, Portland, OR 97239 USA
[3] Oregon Hlth & Sci Univ, Dept Cell & Dev Biol, Portland, OR 97239 USA
[4] Oregon Hlth & Sci Univ, Dept Med, Portland, OR 97239 USA
[5] Oregon Hlth & Sci Univ, OHSU Knight Canc Inst, Portland, OR 97239 USA
基金
美国国家卫生研究院;
关键词
D O I
10.1186/1471-2105-9-520
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Background: Microarray technology has become very popular for globally evaluating gene expression in biological samples. However, non-linear variation associated with the technology can make data interpretation unreliable. Therefore, methods to correct this kind of technical variation are critical. Here we consider a method to reduce this type of variation applied after three common procedures for processing microarray data: MAS 5.0, RMA, and dChip (R). Results: We commonly observe intensity-dependent technical variation between samples in a single microarray experiment. This is most common when MAS 5.0 is used to process probe level data, but we also see this type of technical variation with RMA and dChip (R) processed data. Datasets with unbalanced numbers of up and down regulated genes seem to be particularly susceptible to this type of intensity-dependent technical variation. Unbalanced gene regulation is common when studying cancer samples or genetically manipulated animal models and preservation of this biologically relevant information, while removing technical variation has not been well addressed in the literature. We propose a method based on using rank-invariant, endogenous transcripts as reference points for normalization (GRSN). While the use of rank-invariant transcripts has been described previously, we have added to this concept by the creation of a global rank-invariant set of transcripts used to generate a robust average reference that is used to normalize all samples within a dataset. The global rank-invariant set is selected in an iterative manner so as to preserve unbalanced gene expression. Moreover, our method works well as an overlay that can be applied to data already processed with other probe set summary methods. We demonstrate that this additional normalization step at the "probe set level" effectively corrects a specific type of technical variation that often distorts samples in datasets. Conclusion: We have developed a simple post-processing tool to help detect and correct non-linear technical variation in microarray data and demonstrate how it can reduce technical variation and improve the results of downstream statistical gene selection and pathway identification methods.
引用
收藏
页数:18
相关论文
共 34 条
[1]  
Amati Bruno, 1998, Frontiers in Bioscience, V3, pD250
[2]  
[Anonymous], MICR SUIT 5 0 US GUI
[3]  
[Anonymous], 2004, STAT APPL GENET MOL
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]   Oncogenic pathway signatures in human cancers as a guide to targeted therapies [J].
Bild, AH ;
Yao, G ;
Chang, JT ;
Wang, QL ;
Potti, A ;
Chasse, D ;
Joshi, MB ;
Harpole, D ;
Lancaster, JM ;
Berchuck, A ;
Olson, JA ;
Marks, JR ;
Dressman, HK ;
West, M ;
Nevins, JR .
NATURE, 2006, 439 (7074) :353-357
[6]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[7]   A common set of gene regulatory networks links metabolism and growth inhibition [J].
Cam, H ;
Balciunaite, E ;
Blais, A ;
Spektor, A ;
Scarpulla, RC ;
Young, R ;
Kluger, Y ;
Dynlacht, BD .
MOLECULAR CELL, 2004, 16 (03) :399-411
[8]   Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset [J].
Choe, SE ;
Boutros, M ;
Michelson, AM ;
Church, GM ;
Halfon, MS .
GENOME BIOLOGY, 2005, 6 (02)
[9]   LOWESS - A PROGRAM FOR SMOOTHING SCATTERPLOTS BY ROBUST LOCALLY WEIGHTED REGRESSION [J].
CLEVELAND, WS .
AMERICAN STATISTICIAN, 1981, 35 (01) :54-54
[10]  
Dang CV, 1999, MOL CELL BIOL, V19, P1