Robust trimmed k-means

被引：6

作者：

Dorabiala, Olga ^{[1
]}

Kutz, J. Nathan ^{[1
]}

Aravkin, Aleksandr Y. ^{[1
]}

机构：

[1] Univ Washington, Dept Appl Math, Seattle, WA 98195 USA

来源：

PATTERN RECOGNITION LETTERS | 2022年 / 161卷

关键词：

k-Means; Clustering; Robust statistics; Trimming; Unsupervised learning; OUTLIER DETECTION; FRAMEWORK;

D O I：

10.1016/j.patrec.2022.07.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing be-tween similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robus-tify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single-or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi -membership data without outliers. We also show that RTKM leverages its relative advantages to outper-form other methods on multi-membership data containing outliers. (c) 2022 Published by Elsevier B.V.

引用

页码：9 / 16

页数：8

共 50 条

[41] Random Projection for k-means Clustering
Sieranoja, Sami
Franti, Pasi
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2018, PT I, 2018, 10841 : 680 - 689
[42] A notion of stability for k-means clustering
Le Gouic, T.
Paris, Q.
ELECTRONIC JOURNAL OF STATISTICS, 2018, 12 (02): : 4239 - 4263
[43] The MinMax k-Means clustering algorithm
Tzortzis, Grigorios
Likas, Aristidis
PATTERN RECOGNITION, 2014, 47 (07) : 2505 - 2516
[44] Feature weighting in k-means clustering
Modha, DS
Spangler, WS
MACHINE LEARNING, 2003, 52 (03) : 217 - 237
[45] K-Means Divide and Conquer Clustering
Khalilian, Madjid
Boroujeni, Farsad Zamani
Mustapha, Norwati
Sulaiman, Md. Nasir
2009 INTERNATIONAL CONFERENCE ON COMPUTER AND AUTOMATION ENGINEERING, PROCEEDINGS, 2009, : 306 - 309
[46] A bad instance for k-means plus
Brunsch, Tobias
Roeglin, Heiko
THEORETICAL COMPUTER SCIENCE, 2013, 505 : 19 - 26
[47] Locality Sensitive K-means Clustering
Liu, Chlen-Liang
Hsai, Wen-Hoar
Chang, Tao-Hsing
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2018, 34 (01) : 289 - 305
[48] Modified k-Means Clustering Algorithm
Patel, Vaishali R.
Mehta, Rupa G.
COMPUTATIONAL INTELLIGENCE AND INFORMATION TECHNOLOGY, 2011, 250 : 307 - +
[49] Faster K-Means Cluster Estimation
Khandelwal, Siddhesh
Awekar, Amit
ADVANCES IN INFORMATION RETRIEVAL, ECIR 2017, 2017, 10193 : 520 - 526
[50] The validity of pyramid K-means clustering
Tamir, Dan E.
Park, Chi-Yeon
Yoo, Wook-Sung
MATHEMATICS OF DATA/IMAGE PATTERN RECOGNITION, COMPRESSION, CODING, AND ENCRYPTION X, WITH APPLICATIONS, 2007, 6700

← 1 2 3 4 5 →