In search of optimal centroids on data clustering using a binary search algorithm

被引:41
作者
Hatamlou, Abdolreza [1 ,2 ]
机构
[1] Univ Kebangsaan Malaysia, Data Min & Optimizat Res Grp, Ctr Artificial Intelligence Technol, Bangi 43600, Selangor, Malaysia
[2] Islamic Azad Univ, Khoy Branch, Tehran, Iran
关键词
A binary search algorithm; Optimal centroids; Data clustering; IMAGE SEGMENTATION;
D O I
10.1016/j.patrec.2012.06.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data clustering is an important technique in data mining. It is a method of partitioning data into clusters, in which each cluster must have data of great similarity and different clusters must have data of high dissimilarity. A lot of clustering algorithms are found in the literature. In general, there is no single algorithm that is suitable for all types of data, conditions and applications. Each algorithm has its own advantages, limitations and shortcomings. Therefore, introducing novel and effective approaches for data clustering is an open and active research area. This paper presents a novel binary search algorithm for data clustering that not only finds high quality clusters but also converges to the same solution in different runs. In the proposed algorithm a set of initial centroids are chosen from different parts of the test dataset and then optimal locations for the centroids are found by thoroughly exploring around of the initial centroids. The simulation results using six benchmark datasets from the UCI Machine Learning Repository indicate that proposed algorithm can efficiently be used for data clustering. (c) 2012 Elsevier B.V. All rights reserved.
引用
收藏
页码:1756 / 1760
页数:5
相关论文
共 35 条
  • [1] A TABU SEARCH APPROACH TO THE CLUSTERING PROBLEM
    ALSULTAN, KS
    [J]. PATTERN RECOGNITION, 1995, 28 (09) : 1443 - 1451
  • [2] A document clustering algorithm for discovering and describing topics
    Anaya-Sanchez, Henry
    Pons-Porrata, Aurora
    Berlanga-Llavori, Rafael
    [J]. PATTERN RECOGNITION LETTERS, 2010, 31 (06) : 502 - 510
  • [3] [Anonymous], 2006, SURVEY CLUSTERING DA
  • [4] [Anonymous], UCI Repository of machine learning databases
  • [5] Barbakh WA, 2009, STUD COMPUT INTELL, V249, P7
  • [6] Camastra F, 2008, Machine learning for audio, image and video analysis, P117, DOI [10.1007/978-1-84800-007-0_6, DOI 10.1007/978-1-84800-007-0_6]
  • [7] A genetic algorithm approach to cluster analysis
    Cowgill, MC
    Harvey, RJ
    Watson, LT
    [J]. COMPUTERS & MATHEMATICS WITH APPLICATIONS, 1999, 37 (07) : 99 - 108
  • [8] Application of honey-bee mating optimization algorithm on clustering
    Fathian, Mohammad
    Amiri, Babak
    Maroosi, Ali
    [J]. APPLIED MATHEMATICS AND COMPUTATION, 2007, 190 (02) : 1502 - 1513
  • [9] FORGY EW, 1965, BIOMETRICS, V21, P768
  • [10] Dynamic hierarchical algorithms for document clustering
    Gil-Garcia, Reynaldo
    Pons-Porrata, Aurora
    [J]. PATTERN RECOGNITION LETTERS, 2010, 31 (06) : 469 - 477