Fast and robust general purpose clustering algorithms

被引:32
|
作者
Estivill-Castro, V [1 ]
Yang, J
机构
[1] Griffith Univ, Sch Comp & Informat Technol, Nathan, Qld 4111, Australia
[2] Univ Western Sydney Macarthur, Sch Comp & Informat Technol, Campbelltown, NSW 2560, Australia
关键词
clustering; k-MEANS; medoids; 1-median problem; combinatorial optimization; EXPECTATION MAXIMIZATION;
D O I
10.1023/B:DAMI.0000015869.08323.b3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose an algorithm that remains very efficient, generally applicable, multidimensional but is more robust to noise and outliers. We achieve this by using medians rather than means as estimators for the centers of clusters. Comparison with k-MEANS, EXPECTATION MAXIMIZATION and GIBBS sampling demonstrates the advantages of our algorithm.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [21] A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications
    陈鹏
    张磊
    韩银和
    陈云霁
    Journal of Computer Science & Technology, 2014, 29 (02) : 239 - 246
  • [22] Comparison of Data Mining Clustering Algorithms
    Shah, Chintan
    Jivani, Anjali
    2013 4TH NIRMA UNIVERSITY INTERNATIONAL CONFERENCE ON ENGINEERING (NUICONE 2013), 2013,
  • [23] Review of Web Document Clustering Algorithms
    Sahu, Sanjib Kumar
    Srivastava, Shalini
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1153 - 1155
  • [24] Succinct Initialization Methods for Clustering Algorithms
    Liang, Xueru
    Ren, Shangkun
    Yang, Lei
    ADVANCED INTELLIGENT COMPUTING, 2011, 6838 : 47 - +
  • [25] A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications
    Peng Chen
    Lei Zhang
    Yin-He Han
    Yun-Ji Chen
    Journal of Computer Science and Technology, 2014, 29 : 239 - 246
  • [26] A General-Purpose Many-Accelerator Architecture Based on Dataflow Graph Clustering of Applications
    Chen, Peng
    Zhang, Lei
    Han, Yin-He
    Chen, Yun-Ji
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2014, 29 (02) : 239 - 246
  • [27] CAM: Clustering Algorithms for Multimodal WSN
    Medhat, Fady
    Ramadan, Rabie A.
    Talkhan, Ihab
    INTERNATIONAL JOURNAL OF SYSTEM DYNAMICS APPLICATIONS, 2013, 2 (04) : 47 - 67
  • [28] A General Framework for Mixed and Incomplete Data Clustering Based on Swarm Intelligence Algorithms
    Villuendas-Rey, Yenny
    Barroso-Cubas, Eley
    Camacho-Nieto, Oscar
    Yanez-Marquez, Cornelio
    MATHEMATICS, 2021, 9 (07)
  • [29] Robust distance-based clustering with applications to spatial data mining
    Estivill-Castro, V
    Houle, ME
    ALGORITHMICA, 2001, 30 (02) : 216 - 242
  • [30] OUTLIER-AWARE ROBUST CLUSTERING
    Forero, Pedro A.
    Kekatos, Vassilis
    Giannakis, Georgios B.
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 2244 - 2247