Iterative random projections for high-dimensional data clustering

被引：21

作者：

Cardoso, Angelo ^{[1
]}

Wichert, Andreas

机构：

[1] Univ Tecn Lisboa, INESC ID Lisboa, P-2744016 Porto Salvo, Portugal

来源：

PATTERN RECOGNITION LETTERS | 2012年 / 33卷 / 13期

关键词：

Clustering; K-means; High-dimensional data; Random projections; ALGORITHM; JOHNSON;

D O I：

10.1016/j.patrec.2012.06.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this text we propose a method which efficiently performs clustering of high-dimensional data. The method builds on random projection and the K-means algorithm. The idea is to apply K-means several times, increasing the dimensionality of the data after each convergence of K-means. We compare the proposed algorithm on four high-dimensional datasets, image, text and two synthetic, with K-means clustering using a single random projection and K-means clustering of the original high-dimensional data. Regarding time we observe that the algorithm reduces drastically the time when compared to K-means on the original high-dimensional data. Regarding mean squared error the proposed method reaches a better solution than clustering using a single random projection. More notably in the experiments performed it also reaches a better solution than clustering on the original high-dimensional data. (c) 2012 Elsevier B.V. All rights reserved.

引用

页码：1749 / 1755

页数：7

共 12 条

[1] Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].

Achlioptas, D .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687

[2] Computational experience on four algorithms for the hard clustering problem [J].

AlSultan, KS ;

Khan, MM .

PATTERN RECOGNITION LETTERS, 1996, 17 (03) :295-308

[3]

[Anonymous], 2006, P ACM SIGKDD INT C K

[4]

[Anonymous], 2000, P16 C UNC ART INT

[5]

Boutsidis C., 2010, ADV NEURAL INFORM PR, V23, P298

[6] An elementary proof of a theorem of Johnson and Lindenstrauss [J].

Dasgupta, S ;

Gupta, A .

RANDOM STRUCTURES & ALGORITHMS, 2003, 22 (01) :60-65

[7]

Hecht-Nielsen Robert, 1994, Computational intelligence: Imitating life, P43

[8]

Johnson William B, 1984, C MODERN ANAL PROBAB

[9] A simple linear time (1+ε)-approximation algorithm for k-means clustering in any dimensions [J].

Kumar, A ;

Sabharwal, Y ;

Sen, S .

45TH ANNUAL IEEE SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE, PROCEEDINGS, 2004, :454-462

[10]

LLOYD SP, 1982, IEEE T INFORM THEORY, V28, P129, DOI 10.1109/TIT.1982.1056489

← 1 2 →