Novel clustering algorithm for microarray expression data in a truncated SVD space

被引:46
作者
Horn, D [1 ]
Axel, I [1 ]
机构
[1] Tel Aviv Univ, Raymond & Beverly Sackler Fac Exact Sci, Sch Phys & Astron, IL-69978 Tel Aviv, Israel
关键词
D O I
10.1093/bioinformatics/btg053
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: This paper introduces the application of a novel clustering method to microarray expression data. Its first stage involves compression of dimensions that can be achieved by applying SVD to the gene-sample matrix in microarray problems. Thus the data (samples or genes) can be represented by vectors in a truncated space of low dimensionality, 4 and 5 in the examples studied here. We find it preferable to project all vectors onto the unit sphere before applying a clustering algorithm. The clustering algorithm used here is the quantum clustering method that has one free scale parameter. Although the method is not hierarchical, it can be modified to allow hierarchy in terms of this scale parameter. Results: We apply our method to three data sets. The results are very promising. On cancer cell data we obtain a dendrogram that reflects correct groupings of cells. In an AML/ALL data set we obtain very good clustering of samples into four classes of the data. Finally, in clustering of genes in yeast cell cycle data we obtain four groups in a problem that is estimated to contain five families.
引用
收藏
页码:1110 / 1115
页数:6
相关论文
共 12 条
[1]   Singular value decomposition for genome-wide expression data processing and modeling [J].
Alter, O ;
Brown, PO ;
Botstein, D .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (18) :10101-10106
[2]  
DING H, 2002, P 2 IEEE INT C DAT M
[3]   Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring [J].
Golub, TR ;
Slonim, DK ;
Tamayo, P ;
Huard, C ;
Gaasenbeek, M ;
Mesirov, JP ;
Coller, H ;
Loh, ML ;
Downing, JR ;
Caligiuri, MA ;
Bloomfield, CD ;
Lander, ES .
SCIENCE, 1999, 286 (5439) :531-537
[4]   Fundamental patterns underlying gene expression profiles: Simplicity from complexity [J].
Holter, NS ;
Mitra, M ;
Maritan, A ;
Cieplak, M ;
Banavar, JR ;
Fedoroff, NV .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2000, 97 (15) :8409-8414
[5]   Algorithm for data clustering in pattern recognition problems based on quantum mechanics [J].
Horn, D ;
Gottlieb, A .
PHYSICAL REVIEW LETTERS, 2002, 88 (01) :4
[6]   An introduction to latent semantic analysis [J].
Landauer, TK ;
Foltz, PW ;
Laham, D .
DISCOURSE PROCESSES, 1998, 25 (2-3) :259-284
[7]  
LIN SM, 2002, METHODS MICROARRAY D
[8]  
Press W.H., 1987, Numerical Recipes: The Art of Scientific Computing
[9]  
RAYACHUDHURI S, 2000, PAC S BIOC, P455
[10]  
Ripley B. D., 1996, Pattern Recognition and Neural Networks