Pooling breast cancer datasets has a synergetic effect on classification performance and improves signature stability

被引:66
作者
van Vliet, Martin H. [1 ,2 ]
Reyal, Fabien [3 ,5 ]
Horlings, Hugo M. [3 ]
van de Vijver, Marc J. [3 ,4 ]
Reinders, Marcel J. T. [1 ]
Wessels, Lodewyk F. A. [1 ,2 ]
机构
[1] Delft Univ Technol, Fac Elect Engn Math & Comp Sci, Informat & Commun Theory Grp, NL-2628 CD Delft, Netherlands
[2] Netherlands Canc Inst, Dept Mol Biol, Bioinformat & Stat Grp, NL-1066 CX Amsterdam, Netherlands
[3] Netherlands Canc Inst, Dept Pathol, NL-1066 CX Amsterdam, Netherlands
[4] Acad Med Ctr, Dept Pathol, NL-1100 DD Amsterdam, Netherlands
[5] Inst Curie, Dept Surg, F-75005 Paris, France
关键词
D O I
10.1186/1471-2164-9-375
中图分类号
Q81 [生物工程学(生物技术)]; Q93 [微生物学];
学科分类号
071005 ; 0836 ; 090102 ; 100705 ;
摘要
Background: Michiels et al. (Lancet 2005; 365: 488-92) employed a resampling strategy to show that the genes identified as predictors of prognosis from resamplings of a single gene expression dataset are highly variable. The genes most frequently identified in the separate resamplings were put forward as a 'gold standard'. On a higher level, breast cancer datasets collected by different institutions can be considered as resamplings from the underlying breast cancer population. The limited overlap between published prognostic signatures confirms the trend of signature instability identified by the resampling strategy. Six breast cancer datasets, totaling 947 samples, all measured on the Affymetrix platform, are currently available. This provides a unique opportunity to employ a substantial dataset to investigate the effects of pooling datasets on classifier accuracy, signature stability and enrichment of functional categories. Results: We show that the resampling strategy produces a suboptimal ranking of genes, which can not be considered to be a 'gold standard'. When pooling breast cancer datasets, we observed a synergetic effect on the classification performance in 73% of the cases. We also observe a significant positive correlation between the number of datasets that is pooled, the validation performance, the number of genes selected, and the enrichment of specific functional categories. In addition, we have evaluated the support for five explanations that have been postulated for the limited overlap of signatures. Conclusion: The limited overlap of current signature genes can be attributed to small sample size. Pooling datasets results in more accurate classification and a convergence of signature genes. We therefore advocate the analysis of new data within the context of a compendium, rather than analysis in isolation.
引用
收藏
页数:22
相关论文
共 52 条
[31]   Network modeling links breast cancer susceptibility and centrosome dysfunction [J].
Pujana, Miguel Angel ;
Han, Jing-Dong J. ;
Starita, Lea M. ;
Stevens, Kristen N. ;
Tewari, Muneesh ;
Ahn, Jin Sook ;
Rennert, Gad ;
Moreno, Victor ;
Kirchhoff, Tomas ;
Gold, Bert ;
Assmann, Volker ;
ElShamy, Wael M. ;
Rual, Jean-Francois ;
Levine, Douglas ;
Rozek, Laura S. ;
Gelman, Rebecca S. ;
Gunsalus, Kristin C. ;
Greenberg, Roger A. ;
Sobhian, Bijan ;
Bertin, Nicolas ;
Venkatesan, Kavitha ;
Ayivi-Guedehoussou, Nono ;
Sole, Xavier ;
Hernandez, Pilar ;
Lazaro, Conxi ;
Nathanson, Katherine L. ;
Weber, Barbara L. ;
Cusick, Michael E. ;
Hill, David E. ;
Offit, Kenneth ;
Livingston, David M. ;
Gruber, Stephen B. ;
Parvin, Jeffrey D. ;
Vidal, Marc .
NATURE GENETICS, 2007, 39 (11) :1338-1349
[32]   Multiple robust signatures for detecting lymph node metastasis in head and neck cancer [J].
Roepman, P ;
Kemmeren, P ;
Wessels, LFA ;
Slootweg, PJ ;
Holstege, FCP .
CANCER RESEARCH, 2006, 66 (04) :2361-2366
[33]   4E-binding protein 1, a cell signaling hallmark in breast cancer that correlates with pathologic grade and prognosis [J].
Rojo, Federico ;
Najera, Laura ;
Lirola, Jose ;
Jimenez, Jose ;
Guzman, Marta ;
Dolors Sabadell, M. ;
Baselga, Jose ;
Ramon y Cajal, Santiago .
CLINICAL CANCER RESEARCH, 2007, 13 (01) :81-89
[34]   A module map showing conditional activity of expression modules in cancer [J].
Segal, E ;
Friedman, N ;
Koller, D ;
Regev, A .
NATURE GENETICS, 2004, 36 (10) :1090-1098
[35]   Expression of class III β-tubulin is predictive of patient outcome in patients with non-small cell lung cancer receiving vinorelbine-based chemotherapy [J].
Sève, P ;
Isaac, S ;
Trédan, O ;
Souquet, PJ ;
Pachéco, Y ;
Pérol, M ;
Lafanéchère, L ;
Penet, A ;
Peiller, EL ;
Dumontet, C .
CLINICAL CANCER RESEARCH, 2005, 11 (15) :5481-5486
[36]   Class III β-tubulin expression and benefit from adjuvant cisplatin/vinorelbine chemotherapy in operable non-small cell lung cancer:: Analysis of NCIC JBR.10 [J].
Seve, Pascal ;
Lai, Raymond ;
Ding, Keyue ;
Winton, Timothy ;
Butts, Charles ;
Mackey, John ;
Dumontet, Charles ;
Dabbagh, Laith ;
Aviel-Ronen, Sarit ;
Seymour, Lesley ;
Whitehead, Marlo ;
Tsao, Ming-Sound ;
Shepherd, Frances A. ;
Reiman, Tony .
CLINICAL CANCER RESEARCH, 2007, 13 (03) :994-999
[37]   Involvement of kinesin family member 2C/mitotic centromere-associated kinesin overexpression in mammary carcinogenesis [J].
Shimo, Arata ;
Tanikawa, Chizu ;
Nishidate, Toshihiko ;
Lin, Meng-Lay ;
Matsuda, Koichi ;
Park, Jae-Hyun ;
Ueki, Tomomi ;
Ohta, Tomohiko ;
Hirata, Koichi ;
Fukuda, Mamoru ;
Nakamura, Yusuke ;
Katagiri, Toyomasa .
CANCER SCIENCE, 2008, 99 (01) :62-70
[38]   Opinion - Taking gene-expression profiling to the clinic: when will molecular signatures become relevant to patient care? [J].
Sotiriou, Christos ;
Piccart, Martine J. .
NATURE REVIEWS CANCER, 2007, 7 (07) :545-553
[39]  
Taniuchi K, 2005, CANCER RES, V65, P105
[40]   Activation of KIF4A as a prognostic biomarker and therapeutic target for lung cancer [J].
Taniwaki, Masaya ;
Takano, Atsushi ;
Ishikawa, Nobuhisa ;
Yasui, Wataru ;
Inai, Kouki ;
Nishimura, Hitoshi ;
Tsuchiya, Eiju ;
Kohno, Nobuoki ;
Nakamura, Yusuke ;
Daigo, Yataro .
CLINICAL CANCER RESEARCH, 2007, 13 (22) :6624-6631