Normalization and Gene p-Value Estimation: Issues in Microarray Data Processing

被引:15
作者
Fundel, Katrin [1 ]
Kueffner, Robert [1 ]
Aigner, Thomas [2 ]
Zimmer, Ralf [1 ]
机构
[1] Univ Munhen, Inst Informat, Amalienstrasse 17, D-80363 Munich, Germany
[2] Univ Leipzig, Inst Pathol, D-04103 Leipzig, Germany
关键词
expression data; normalization; regulated genes; data processing;
D O I
10.4137/BBI.S441
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Introduction: Numerous methods exist for basic processing, e.g. normalization, of microarray gene expression data. These methods have an important effect on the final analysis outcome. Therefore, it is crucial to select methods appropriate for a given dataset in order to assure the validity and reliability of expression data analysis. Furthermore, biological interpretation requires expression values for genes, which are often represented by several spots or probe sets on a microarray. How to best integrate spot/probe set values into gene values has so far been a somewhat neglected problem. Results: We present a case study comparing different between-array normalization methods with respect to the identification of differentially expressed genes. Our results show that it is feasible and necessary to use prior knowledge on gene expression measurements to select an adequate normalization method for the given data. Furthermore, we provide evidence that combining spot/probe set p-values into gene p-values for detecting differentially expressed genes has advantages compared to combining expression values for spots/probe sets into gene expression values. The comparison of different methods suggests to use Stouffer's method for this purpose. The study has been conducted on gene expression experiments investigating human joint cartilage samples of Osteoarthritis related groups: a cDNA microarray (83 samples, four groups) and an Affymetrix (26 samples, two groups) data set. Conclusion: The apparently straight forward steps of gene expression data analysis, e.g. between-array normalization and detection of differentially regulated genes, can be accomplished by numerous different methods. We analyzed multiple methods and the possible effects and thereby demonstrate the importance of the single decisions taken during data processing. We give guidelines for evaluating normalization outcomes. An overview of these effects via appropriate measures and plots compared to prior knowledge is essential for the biological interpretation of gene expression measurements.
引用
收藏
页码:291 / 305
页数:15
相关论文
共 46 条
[1]   Functional genomics of osteoarthritis - On the way to evaluate disease hypotheses [J].
Aigner, T ;
Bartnik, E ;
Sohler, F ;
Zimmer, R .
CLINICAL ORTHOPAEDICS AND RELATED RESEARCH, 2004, (427) :S138-S143
[2]   Genomics of osteoarthritis [J].
Aigner, T ;
Dudhia, J .
CURRENT OPINION IN RHEUMATOLOGY, 2003, 15 (05) :634-640
[3]   Functional genomics of osteoarthritis [J].
Aigner, T ;
Bartnik, E ;
Zien, A ;
Zimmer, R .
PHARMACOGENOMICS, 2002, 3 (05) :635-650
[4]   Large-scale gene expression profiling reveals major pathogenetic pathways of cartilage degeneration in osteoarthritis [J].
Aigner, Thomas ;
Fundel, Katrin ;
Saas, Joachim ;
Gebhard, Pia M. ;
Haag, Jochen ;
Weiss, Tilo ;
Zien, Alexander ;
Obermayr, Franz ;
Zimmer, Ralf ;
Bartnik, Eckart .
ARTHRITIS AND RHEUMATISM, 2006, 54 (11) :3533-3544
[5]   Faster cyclic loess: normalizing RNA arrays via linear models [J].
Ballman, KV ;
Grill, DE ;
Oberg, AL ;
Therneau, TM .
BIOINFORMATICS, 2004, 20 (16) :2778-2786
[6]   A comparison of normalization methods for high density oligonucleotide array data based on variance and bias [J].
Bolstad, BM ;
Irizarry, RA ;
Åstrand, M ;
Speed, TP .
BIOINFORMATICS, 2003, 19 (02) :185-193
[7]   Meta-analysis of estrogen therapy in the management of urogenital atrophy in postmenopausal women: Second report of the hormones and urogenital therapy committee [J].
Cardozo, L ;
Bachmann, G ;
McClish, D ;
Fonda, D ;
Birgerson, L .
OBSTETRICS AND GYNECOLOGY, 1998, 92 (04) :722-727
[8]   Preferred analysis methods for Affymetrix GeneChips revealed by a wholly defined control dataset [J].
Choe, SE ;
Boutros, M ;
Michelson, AM ;
Church, GM ;
Halfon, MS .
GENOME BIOLOGY, 2005, 6 (02)
[9]   ArrayXPath: mapping and visualizing microarray gene-expression data with integrated biological pathway resources using Scalable Vector Graphics [J].
Chung, HJ ;
Kim, M ;
Park, CH ;
Kim, J ;
Kim, JH .
NUCLEIC ACIDS RESEARCH, 2004, 32 :W460-W464
[10]   ROBUST LOCALLY WEIGHTED REGRESSION AND SMOOTHING SCATTERPLOTS [J].
CLEVELAND, WS .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1979, 74 (368) :829-836