Robust trimmed k-means

被引:6
作者
Dorabiala, Olga [1 ]
Kutz, J. Nathan [1 ]
Aravkin, Aleksandr Y. [1 ]
机构
[1] Univ Washington, Dept Appl Math, Seattle, WA 98195 USA
关键词
k-Means; Clustering; Robust statistics; Trimming; Unsupervised learning; OUTLIER DETECTION; FRAMEWORK;
D O I
10.1016/j.patrec.2022.07.007
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Clustering is a fundamental tool in unsupervised learning, used to group objects by distinguishing be-tween similar and dissimilar features of a given data set. One of the most common clustering algorithms is k-means. Unfortunately, when dealing with real-world data many traditional clustering algorithms are compromised by lack of clear separation between groups, noisy observations, and/or outlying data points. Thus, robust statistical algorithms are required for successful data analytics. Current methods that robus-tify k-means clustering are specialized for either single or multi-membership data, but do not perform competitively in both cases. We propose an extension of the k-means algorithm, which we call Robust Trimmed k-means (RTKM) that simultaneously identifies outliers and clusters points and can be applied to either single-or multi-membership data. We test RTKM on various real-world datasets and show that RTKM performs competitively with other methods on single membership data with outliers and multi -membership data without outliers. We also show that RTKM leverages its relative advantages to outper-form other methods on multi-membership data containing outliers. (c) 2022 Published by Elsevier B.V.
引用
收藏
页码:9 / 16
页数:8
相关论文
共 38 条
[1]  
Ahmed M, 2013, C IND ELECT APPL, P577
[2]  
[Anonymous], SIAM NEWS
[3]   Trimmed Statistical Estimation via Variance Reduction [J].
Aravkin, Aleksandr ;
Davis, Damek .
MATHEMATICS OF OPERATIONS RESEARCH, 2020, 45 (01) :292-322
[4]   Fuzzy C-Means clustering algorithm for data with unequal cluster sizes and contaminated with noise and outliers: Review and development [J].
Askari, Salar .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 165
[5]   Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality [J].
Attouch, Hedy ;
Bolte, Jerome ;
Redont, Patrick ;
Soubeyran, Antoine .
MATHEMATICS OF OPERATIONS RESEARCH, 2010, 35 (02) :438-457
[6]  
Banerjee A., 2005, P 11 ACM SIGKDD INT, P532
[7]  
ben NCir C.-E., 2013, IEEE INT C COMPUTER, P1, DOI DOI 10.1109/ICCAT.2013.6522010
[9]  
Bishop C., 2006, Pattern Recognition and Machine Learning
[10]   A Unified Sparse Optimization Framework to Learn Parsimonious Physics-Informed Models From Data [J].
Champion, Kathleen ;
Zheng, Peng ;
Aravkin, Aleksandr Y. ;
Brunton, Steven L. ;
Kutz, J. Nathan .
IEEE ACCESS, 2020, 8 :169259-169271