Binary state pattern clustering: A digital paradigm for class and biomarker discovery in gene microarray studies of cancer

被引:7
作者
Beattie, Bradley J.
Robinson, Peter N.
机构
[1] Mem Sloan Kettering Canc Ctr, Dept Neurol, New York, NY 10021 USA
[2] Humboldt Univ, Charite Univ Hosp, Inst Med Genet, Berlin, Germany
关键词
biclustering; biomarker discovery; class discovery; clustering; gene microarray;
D O I
10.1089/cmb.2006.13.1114
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Class and biomarker discovery continue to be among the preeminent goals in gene microarray studies of cancer. We have developed a new data mining technique, which we call Binary State Pattern Clustering (BSPC) that is specifically adapted for these purposes, with cancer and other categorical datasets. BSPC is capable of uncovering statistically significant sample subclasses and associated marker genes in a completely unsupervised manner. This is accomplished through the application of a digital paradigm, where the expression level of each potential marker gene is treated as being representative of its discrete functional state. Multiple genes that divide samples into states along the same boundaries form a kind of gene-cluster that has an associated sample-cluster. BSPC is an extremely fast deterministic algorithm that scales well to large datasets. Here we describe results of its application to three publicly available oligonucleotide microarray datasets. Using an alpha-level of 0.05, clusters reproducing many of the known sample classifications were identified along with associated biomarkers. In addition, a number of simulations were conducted using shuffled versions of each of the original datasets, noise-added datasets, as well as completely artificial datasets. The robustness of BSPC was compared to that of three other publicly available clustering methods: ISIS, CTWC and SAMBA. The simulations demonstrate BSPC's substantially greater noise tolerance and confirm the accuracy of our calculations of statistical significance.
引用
收藏
页码:1114 / 1130
页数:17
相关论文
共 32 条
[1]  
BENE MC, 1995, LEUKEMIA, V9, P1783
[2]   THE LCK GENE IS INVOLVED IN THE T(1-7)(P34-Q34) IN THE T-CELL ACUTE LYMPHOBLASTIC-LEUKEMIA DERIVED CELL-LINE, HSB-2 [J].
BURNETT, RC ;
DAVID, JC ;
HARDEN, AM ;
LEBEAU, MM ;
ROWLEY, JD ;
DIAZ, MO .
GENES CHROMOSOMES & CANCER, 1991, 3 (06) :461-467
[3]  
BUSYGIN S, 2002, P 2 SIAM INT C DAT M
[4]   ZAP-70 - A 70 KD PROTEIN-TYROSINE KINASE THAT ASSOCIATES WITH THE TCR ZETA-CHAIN [J].
CHAN, AC ;
IWASHIMA, M ;
TURCK, CW ;
WEISS, A .
CELL, 1992, 71 (04) :649-662
[5]   Differential expression of p53, p63 and p73 proteins in human buccal squamous-cell carcinomas [J].
Chen, YK ;
Huse, SS ;
Lin, LM .
CLINICAL OTOLARYNGOLOGY, 2003, 28 (05) :451-455
[6]  
CHENG Y, 2000, P 8 INT C INT SYST M, P93
[7]   P63 and EGFR as prognostic predictors in stage IIB radiation-treated cervical squamous cell carcinoma [J].
Cho, NH ;
Kim, YB ;
Park, TK ;
Kim, GE ;
Park, K ;
Song, KJ .
GYNECOLOGIC ONCOLOGY, 2003, 91 (02) :346-353
[8]   Coupled two-way clustering analysis of gene microarray data [J].
Getz, G ;
Levine, E ;
Domany, E .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (22) :12079-12084
[9]   Coupled two-way clustering analysis of breast cancer and colon cancer gene expression data [J].
Getz, G ;
Gal, H ;
Kela, I ;
Notterman, DA ;
Domany, E .
BIOINFORMATICS, 2003, 19 (09) :1079-1089
[10]   Polycomb CBX7 has a unifying role in cellular lifespan [J].
Gil, J ;
Bernard, D ;
Martínez, D ;
Beach, D .
NATURE CELL BIOLOGY, 2004, 6 (01) :67-U19