A Novel Density based Clustering Algorithm and Its Parallelization

被引:13
作者
Li, Xiaokang [1 ]
Yu, Binbin [1 ]
Zhou, Yinghua [1 ]
Sun, Guangzhong [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230026, Anhui, Peoples R China
来源
2014 15TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT 2014) | 2014年
关键词
Kmms; clustering algorithm; density; parallelize; INITIALIZATION;
D O I
10.1109/PDCAT.2014.9
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
k-Means, a simple but effective clustering algorithm, is widely used in data mining, machine learning and computer vision community. k-Means algorithm consists of initialization of duster centers and iteration. The initial duster centers have a great impact on duster result and algorithm efficiency. More appropriate initial centers of k-Means can get closer to the optimum solution, and even much quicker convergence. In this paper, we propose a novel clustering algorithm, Kmms, which is the abbreviation of k-Means and Mean Shift. It is a density based algorithm. Experiments show our algorithm not only costs less initialization time compared with other density based algorithms, but also achieves better clustering quality and higher efficiency. And compared with the popular k-Means++ algorithm, our method gets comparable accuracy, mostly even better. Furthermore, we parallelize Kmms algorithm based on OPenMP from both initialization and iteration step and prove the convergence of the algorithm.
引用
收藏
页码:1 / 6
页数:6
相关论文
共 21 条
  • [1] New methods for the initialisation of clusters
    AlDaoud, MB
    Roberts, SA
    [J]. PATTERN RECOGNITION LETTERS, 1996, 17 (05) : 451 - 455
  • [2] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
  • [3] Bache K., 2014, UCI MACHINE LEARNING
  • [4] A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA
    BALL, GH
    HALL, DJ
    [J]. BEHAVIORAL SCIENCE, 1967, 12 (02): : 153 - &
  • [5] Mean shift: A robust approach toward feature space analysis
    Comaniciu, D
    Meer, P
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (05) : 603 - 619
  • [6] A data-clustering algorithm on distributed memory multiprocessors
    Dhillon, IS
    Modha, DS
    [J]. LARGE-SCALE PARALLEL DATA MINING, 2000, 1759 : 245 - 260
  • [7] Clustering large graphs via the Singular Value Decomposition
    Drineas, P
    Frieze, A
    Kannan, R
    Vempala, S
    Vinay, V
    [J]. MACHINE LEARNING, 2004, 56 (1-3) : 9 - 33
  • [8] Ene A., 2011, P 17 ACM KDD, P681, DOI DOI 10.1145/2020408.2020515
  • [9] A new algorithm for initial cluster centers in k-means algorithm
    Erisoglu, Murat
    Calis, Nazif
    Sakallioglu, Sadullah
    [J]. PATTERN RECOGNITION LETTERS, 2011, 32 (14) : 1701 - 1705
  • [10] FORGY EW, 1965, BIOMETRICS, V21, P768