Data decomposition for parallel K-means clustering

被引:0
作者
Gursoy, A [1 ]
机构
[1] Koc Univ, Dept Comp Engn, TR-34450 Istanbul, Turkey
来源
PARALLEL PROCESSING AND APPLIED MATHEMATICS | 2004年 / 3019卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing fast algorithms for clustering has been an important area of research in data mining and other fields. K-means is one of the widely used clustering algorithms. In this work, we have developed and evaluated parallelization of k-means method for low-dimensional data on message passing computers. Three different data decomposition schemes and their impact on the pruning of distance calculations in tree-based k-means algorithm have been studied. Random pattern decomposition has good load balancing but fails to prune distance calculations effectively. Compact spatial decomposition of patterns based on space filling curves outperforms random pattern decomposition even though it has load imbalance problem. In both cases, parallel tree-based k-means clustering runs significantly faster than the traditional parallel k-means.
引用
收藏
页码:241 / 248
页数:8
相关论文
共 9 条
  • [1] ALSABTI K, 1998, IPPS SPDP 1 WORKSH H
  • [2] [Anonymous], 1989, The Design and Analysis of Spatial Data Structures
  • [3] DHILLON I, 1999, WORKSH LARG SCAL PAR
  • [4] Gursoy A., 2001, Euro-Par 2001 Parallel Processing. 7th International Euro-Par Conference. Proceedings (Lecture Notes in Computer Science Vol.2150), P321
  • [5] Data clustering: A review
    Jain, AK
    Murty, MN
    Flynn, PJ
    [J]. ACM COMPUTING SURVEYS, 1999, 31 (03) : 264 - 323
  • [6] JUDD D, 1996, P 13 INT C PATT REC
  • [7] MCQUEEN J, 1997, P 5 BERK S MATH STAT, P173
  • [8] LOAD BALANCING AND DATA LOCALITY IN ADAPTIVE HIERARCHICAL N-BODY METHODS - BARNES-HUT, FAST MULTIPOLE, AND RADIOSITY
    SINGH, JP
    HOLT, C
    TOTSUKA, T
    GUPTA, A
    HENNESSY, J
    [J]. JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 1995, 27 (02) : 118 - 141
  • [9] Molecular descriptors for effective classification of biologically active compounds based on principal component analysis identified by a genetic algorithm
    Xue, L
    Bajorath, J
    [J]. JOURNAL OF CHEMICAL INFORMATION AND COMPUTER SCIENCES, 2000, 40 (03): : 801 - 809