Iterative random projections for high-dimensional data clustering

被引:21
作者
Cardoso, Angelo [1 ]
Wichert, Andreas
机构
[1] Univ Tecn Lisboa, INESC ID Lisboa, P-2744016 Porto Salvo, Portugal
关键词
Clustering; K-means; High-dimensional data; Random projections; ALGORITHM; JOHNSON;
D O I
10.1016/j.patrec.2012.06.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this text we propose a method which efficiently performs clustering of high-dimensional data. The method builds on random projection and the K-means algorithm. The idea is to apply K-means several times, increasing the dimensionality of the data after each convergence of K-means. We compare the proposed algorithm on four high-dimensional datasets, image, text and two synthetic, with K-means clustering using a single random projection and K-means clustering of the original high-dimensional data. Regarding time we observe that the algorithm reduces drastically the time when compared to K-means on the original high-dimensional data. Regarding mean squared error the proposed method reaches a better solution than clustering using a single random projection. More notably in the experiments performed it also reaches a better solution than clustering on the original high-dimensional data. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:1749 / 1755
页数:7
相关论文
共 12 条
[1]   Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].
Achlioptas, D .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687
[2]   Computational experience on four algorithms for the hard clustering problem [J].
AlSultan, KS ;
Khan, MM .
PATTERN RECOGNITION LETTERS, 1996, 17 (03) :295-308
[3]  
[Anonymous], 2006, P ACM SIGKDD INT C K
[4]  
[Anonymous], 2000, P16 C UNC ART INT
[5]  
Boutsidis C., 2010, ADV NEURAL INFORM PR, V23, P298
[6]   An elementary proof of a theorem of Johnson and Lindenstrauss [J].
Dasgupta, S ;
Gupta, A .
RANDOM STRUCTURES & ALGORITHMS, 2003, 22 (01) :60-65
[7]  
Hecht-Nielsen Robert, 1994, Computational intelligence: Imitating life, P43
[8]  
Johnson William B, 1984, C MODERN ANAL PROBAB
[9]   A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions [J].
Kumar, A ;
Sabharwal, Y ;
Sen, S .
45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, :454-462
[10]  
LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489