Comparison of the predictive accuracy of DNA array-based multigene classifiers across cDNA arrays and affymetrix GeneChips

被引:39
作者
Stec, J
Wang, J
Coombes, K
Ayers, M
Hoersch, S
Gold, DL
Ross, JS
Hess, KR
Tirrell, S
Linette, G
Hortobagyi, GN
Symmans, WF
Pusztai, L
机构
[1] Univ Texas, MD Anderson Canc Ctr, Dept Breast Med Oncol, Houston, TX 77030 USA
[2] Univ Texas, MD Anderson Canc Ctr, Dept Biostat, Houston, TX 77030 USA
[3] Univ Texas, MD Anderson Canc Ctr, Dept Pathol, Houston, TX 77030 USA
[4] Millennium Pharmaceut Inc, Cambridge, MA USA
[5] Praecis Pharmaceut, Waltham, MA USA
关键词
D O I
10.1016/S1525-1578(10)60565-X
中图分类号
R36 [病理学];
学科分类号
100104 ;
摘要
We examined how well differentially expressed genes and multigene outcome classifiers retain their class-discriminating values when tested on data generated by different transcriptional profiling platforms. RNA from 33 stage I-III breast cancers was hybridized to both Affymetrix GeneChip and Millennium Pharmaceuticals cDNA arrays. Only 30% of all corresponding gene expression measurements on the two platforms had Pearson correlation coefficient r >= 0.7 when UniGene was used to match probes. There was substantial variation in correlation between different Affymetrix probe sets matched to the same cDNA probe. When cDNA and Affymetrix probes were matched by basic local alignment tool (BLAST) sequence identity, the correlation increased substantially. We identified 182 genes in the Affymetrix and 45 in the cDNA data (including 17 common genes) that accurately separated 91% of cases in supervised hierarchical clustering In each data set. Cross-platform testing of these informative genes resulted in lower clustering accuracy of 45 and 79%, respectively. Several sets of accurate five-gene classifiers were developed on each platform using linear discriminant analysis. The best 100 classifiers showed average misclassification error rate of 2% on the original data that rose to 19.5% when tested on data from the other platform. Random five-gene classifiers showed misclassification error rate of 33%. We conclude that multigene predictors optimized for one platform lose accuracy when applied to data from another platform due to missing genes and sequence differences in probes that result in differing measurements for the same gene.
引用
收藏
页码:357 / 367
页数:11
相关论文
共 22 条
[1]  
Ali Tahir R, 2003, Methods Mol Med, V71, P119
[2]  
[Anonymous], 1989, GENETIC ALGORITHM SE
[3]   Gene expression profiles predict complete pathologic response to neoadjuvant paclitaxel and fluorouracil, doxorubicin, and cyclophosphamide chemotherapy in breast cancer [J].
Ayers, M ;
Symmans, WF ;
Stec, J ;
Damokosh, AI ;
Clark, E ;
Hess, K ;
Lecocke, M ;
Metivier, J ;
Booser, D ;
Ibrahim, N ;
Valero, V ;
Royce, M ;
Arun, B ;
Whitman, G ;
Ross, J ;
Sneige, N ;
Hortobagyi, GN ;
Pusztai, L .
JOURNAL OF CLINICAL ONCOLOGY, 2004, 22 (12) :2284-2293
[4]   CONTROLLING THE FALSE DISCOVERY RATE - A PRACTICAL AND POWERFUL APPROACH TO MULTIPLE TESTING [J].
BENJAMINI, Y ;
HOCHBERG, Y .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1995, 57 (01) :289-300
[5]  
de Bolle Xavier, 2003, Methods Mol Med, V71, P135
[6]  
Gruvberger S, 2001, CANCER RES, V61, P5979
[7]   Prediction of compound signature using high density gene expression profiling [J].
Hamadeh, HK ;
Bushel, PR ;
Jayadev, S ;
DiSorbo, O ;
Bennett, L ;
Li, LP ;
Tennant, R ;
Stoll, R ;
Barrett, JC ;
Paules, RS ;
Blanchard, K ;
Afshari, CA .
TOXICOLOGICAL SCIENCES, 2002, 67 (02) :232-240
[8]   Gene expression profile analysis by DNA microarrays - Promise and pitfalls [J].
King, HC ;
Sinha, AA .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2001, 286 (18) :2280-2288
[9]   Analysis of matched mRNA measurements from two different microarray technologies [J].
Kuo, WP ;
Jenssen, TK ;
Butte, AJ ;
Ohno-Machado, L ;
Kohane, IS .
BIOINFORMATICS, 2002, 18 (03) :405-412
[10]   Gene selection for sample classification based on gene expression data: study of sensitivity to choice of parameters of the GA/KNN method [J].
Li, LP ;
Weinberg, CR ;
Darden, TA ;
Pedersen, LG .
BIOINFORMATICS, 2001, 17 (12) :1131-1142