Robust trimmed k-means

被引:6
|
作者
Dorabiala, Olga [1 ]
Kutz, J. Nathan [1 ]
Aravkin, Aleksandr Y. [1 ]
机构
[1] Univ Washington, Dept Appl Math, Seattle, WA 98195 USA
关键词
k-Means; Clustering; Robust statistics; Trimming; Unsupervised learning; OUTLIER DETECTION; FRAMEWORK;
D O I
10.1016/j.patrec.2022.07.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing be-tween similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robus-tify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single-or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi -membership data without outliers. We also show that RTKM leverages its relative advantages to outper-form other methods on multi-membership data containing outliers. (c) 2022 Published by Elsevier B.V.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 50 条
  • [1] A central limit theorem for multivariate generalized trimmed k-means
    García-Escudero, LA
    Gordaliza, A
    Matrán, C
    ANNALS OF STATISTICS, 1999, 27 (03) : 1061 - 1079
  • [2] Trimmed k-means:: An attempt to robustify quantizers
    Cuesta-Albertos, JA
    Gordaliza, A
    Matrán, C
    ANNALS OF STATISTICS, 1997, 25 (02) : 553 - 576
  • [3] t-k-means: A ROBUST AND STABLE k-means VARIANT
    Li, Yiming
    Zhang, Yang
    Tang, Qingtao
    Huang, Weipeng
    Jiang, Yong
    Xia, Shu-Tao
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 3120 - 3124
  • [4] A New Approach for Robust Mean-Variance Portfolio Selection Using Trimmed k-Means Clustering
    Gubu, La
    Rosadi, Dedi
    Abdurakhman
    INDUSTRIAL ENGINEERING AND MANAGEMENT SYSTEMS, 2021, 20 (04): : 782 - 794
  • [5] ROBUST k-MEANS CLUSTERING FOR DISTRIBUTIONS WITH TWO MOMENTS
    Klochkov, Yegor
    Kroshnin, Alexey
    Zhivotovskiy, Nikita
    ANNALS OF STATISTICS, 2021, 49 (04) : 2206 - 2230
  • [6] Snipping for robust k-means clustering under component-wise contamination
    Farcomeni, Alessio
    STATISTICS AND COMPUTING, 2014, 24 (06) : 907 - 919
  • [7] Feature Selection Embedded Robust K-Means
    Zhang, Qian
    Peng, Chong
    IEEE ACCESS, 2020, 8 : 166164 - 166175
  • [8] K-means-sharp: modified centroid update for outlier-robust k-means clustering
    Olukanmi, Peter O.
    Twala, Blhekisipho
    2017 PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA AND ROBOTICS AND MECHATRONICS (PRASA-ROBMECH), 2017, : 14 - 19
  • [9] Deep k-Means: Jointly clustering with k-Means and learning representations
    Fard, Maziar Moradi
    Thonet, Thibaut
    Gaussier, Eric
    PATTERN RECOGNITION LETTERS, 2020, 138 : 185 - 192
  • [10] Unsupervised K-Means Clustering Algorithm
    Sinaga, Kristina P.
    Yang, Miin-Shen
    IEEE ACCESS, 2020, 8 : 80716 - 80727