Robust trimmed k-means

被引:6
|
作者
Dorabiala, Olga [1 ]
Kutz, J. Nathan [1 ]
Aravkin, Aleksandr Y. [1 ]
机构
[1] Univ Washington, Dept Appl Math, Seattle, WA 98195 USA
关键词
k-Means; Clustering; Robust statistics; Trimming; Unsupervised learning; OUTLIER DETECTION; FRAMEWORK;
D O I
10.1016/j.patrec.2022.07.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing be-tween similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robus-tify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single-or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi -membership data without outliers. We also show that RTKM leverages its relative advantages to outper-form other methods on multi-membership data containing outliers. (c) 2022 Published by Elsevier B.V.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 50 条
  • [41] Random Projection for k-means Clustering
    Sieranoja, Sami
    Franti, Pasi
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 680 - 689
  • [42] A notion of stability for k-means clustering
    Le Gouic, T.
    Paris, Q.
    ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 4239 - 4263
  • [43] The MinMax k-Means clustering algorithm
    Tzortzis, Grigorios
    Likas, Aristidis
    PATTERN RECOGNITION, 2014, 47 (07) : 2505 - 2516
  • [44] Feature weighting in k-means clustering
    Modha, DS
    Spangler, WS
    MACHINE LEARNING, 2003, 52 (03) : 217 - 237
  • [45] K-Means Divide and Conquer Clustering
    Khalilian, Madjid
    Boroujeni, Farsad Zamani
    Mustapha, Norwati
    Sulaiman, Md. Nasir
    2009 INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, PROCEEDINGS, 2009, : 306 - 309
  • [46] A bad instance for k-means plus
    Brunsch, Tobias
    Roeglin, Heiko
    THEORETICAL COMPUTER SCIENCE, 2013, 505 : 19 - 26
  • [47] Locality Sensitive K-means Clustering
    Liu, Chlen-Liang
    Hsai, Wen-Hoar
    Chang, Tao-Hsing
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2018, 34 (01) : 289 - 305
  • [48] Modified k-Means Clustering Algorithm
    Patel, Vaishali R.
    Mehta, Rupa G.
    COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 307 - +
  • [49] Faster K-Means Cluster Estimation
    Khandelwal, Siddhesh
    Awekar, Amit
    ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 520 - 526
  • [50] The validity of pyramid K-means clustering
    Tamir, Dan E.
    Park, Chi-Yeon
    Yoo, Wook-Sung
    MATHEMATICS OF DATA/IMAGE PATTERN RECOGNITION, COMPRESSION, CODING, AND ENCRYPTION X, WITH APPLICATIONS, 2007, 6700