Fast and robust general purpose clustering algorithms

被引:32
|
作者
Estivill-Castro, V [1 ]
Yang, J
机构
[1] Griffith Univ, Sch Comp & Informat Technol, Nathan, Qld 4111, Australia
[2] Univ Western Sydney Macarthur, Sch Comp & Informat Technol, Campbelltown, NSW 2560, Australia
关键词
clustering; k-MEANS; medoids; 1-median problem; combinatorial optimization; EXPECTATION MAXIMIZATION;
D O I
10.1023/B:DAMI.0000015869.08323.b3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
General purpose and highly applicable clustering methods are usually required during the early stages of knowledge discovery exercises. k-MEANS has been adopted as the prototype of iterative model-based clustering because of its speed, simplicity and capability to work within the format of very large databases. However, k-MEANS has several disadvantages derived from its statistical simplicity. We propose an algorithm that remains very efficient, generally applicable, multidimensional but is more robust to noise and outliers. We achieve this by using medians rather than means as estimators for the centers of clusters. Comparison with k-MEANS, EXPECTATION MAXIMIZATION and GIBBS sampling demonstrates the advantages of our algorithm.
引用
收藏
页码:127 / 150
页数:24
相关论文
共 50 条
  • [41] Comparing Clustering Algorithms On Wisconsin Data Set
    Erken, Mucahit
    2016 24TH SIGNAL PROCESSING AND COMMUNICATION APPLICATION CONFERENCE (SIU), 2016, : 1541 - 1544
  • [42] A Comparative Study of Clustering Algorithms for Mixed Datasets
    Harous, Saad
    Al Harmoodi, Maryam
    Biri, Hessa
    PROCEEDINGS 2019 AMITY INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AICAI), 2019, : 484 - 488
  • [43] New Clustering Algorithms for Twitter Sentiment Analysis
    Rehioui, Hajar
    Idrissi, Abdellah
    IEEE SYSTEMS JOURNAL, 2020, 14 (01): : 530 - 537
  • [44] Scalable Clustering Algorithms for Big Data: A Review
    Mahdi, Mahmoud A.
    Hosny, Khalid M.
    Elhenawy, Ibrahim
    IEEE ACCESS, 2021, 9 : 80015 - 80027
  • [45] A survey on parallel clustering algorithms for Big Data
    Dafir, Zineb
    Lamari, Yasmine
    Slaoui, Said Chah
    ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (04) : 2411 - 2443
  • [46] Comparison of Clustering Algorithms for Revenue and Cost Analysis
    Boyko, Nataliya
    Hetman, Solomiya
    Kots, Iryna
    COLINS 2021: COMPUTATIONAL LINGUISTICS AND INTELLIGENT SYSTEMS, VOL I, 2021, 2870
  • [47] Comparative Analysis of Optimized Algorithms for Ontology Clustering
    Tiwari, Avantika
    Kumar, Ajay
    2018 5TH IEEE UTTAR PRADESH SECTION INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS AND COMPUTER ENGINEERING (UPCON), 2018, : 740 - 746
  • [48] Using SVM and Clustering Algorithms in IDS Systems
    Scherer, Peter
    Vicher, Martin
    Drazdilova, Pavla
    Martinovic, Jan
    Dvorsky, Jiri
    Snasel, Vaclav
    DATESO 2011: DATABASES, TEXTS, SPECIFICATIONS, OBJECTS, 2011, 706 : 108 - 119
  • [49] CLUSTERING IN GENERAL MEASUREMENT ERROR MODELS
    Su, Ya
    Reedy, Jill
    Carroll, Raymond J.
    STATISTICA SINICA, 2018, 28 (04) : 2337 - 2351
  • [50] Comparative study of Data Mining Clustering algorithms
    Venkatkumar, Iyer Aurobind
    Shardaben, Sanatkumar Jayantibhai Kondhol
    PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON DATA SCIENCE & ENGINEERING (ICDSE), 2016, : 72 - 78