Robust multi-view k-means clustering with outlier removal

被引:25
作者
Chen, Chuan [1 ,2 ]
Wang, Yu [1 ]
Hu, Weibo [1 ]
Zheng, Zibin [1 ,2 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou, Peoples R China
[2] Sun Yat Sen Univ, Natl Engn Res Ctr Digital Life, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Multi-view clustering; Robust clustering; K-means; Outlier detection; OBJECT CLASSES; ALGORITHM;
D O I
10.1016/j.knosys.2020.106518
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Contemporary datasets are often comprised of multiple views of data, which provide complete and complementary information in different views, and multi-view clustering is one of the most crucial techniques in multi-view data analysis. However, traditional multi-view clustering methods are sensitive to noises and outliers, suffering from severe performance degradation when the dataset contains many outliers. Moreover, the commonly used multi-view clustering methods are restricted by high time complexity. To address these problems, we propose a robust multi-view k-means algorithm with outlier detection, i.e., Multi-View Clustering with Outlier Removal (MVCOR). This method is designed to remove the outliers and thus boosts the clustering performance on multi-view data with low time complexity. By defining two types of outliers, MVCOR uses the well-defined outlier removal strategy to categorize all the outliers into two specific clusters and performs robust clustering on the clean data at the same time. This strategy significantly improves the clustering performance as well as the model robustness, making MVCOR a more practical approach for real-world scenarios. Besides, the proposed model is efficiently optimized by a well-designed alternating minimization algorithm which is strictly proved to be convergent. Extensive experiments on both synthetic and real-world datasets demonstrate that MVCOR consistently outperforms the related clustering methods on clustering performance as well as robustness to outliers, and achieves comparable performance to the state-of-the-art multi-view outlier detection methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:12
相关论文
共 62 条
[1]  
Ahmed M, 2013, C IND ELECT APPL, P577
[2]   Clustering-Based Anomaly Detection in Multi-View Data [J].
Alvarez, Alejandro Marcos ;
Yamada, Makoto ;
Kimura, Akisato ;
Iwata, Tomoharu .
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), 2013, :1545-1548
[3]   Distance-based detection and prediction of outliers [J].
Angiulli, F ;
Basta, S ;
Pizzuti, C .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (02) :145-160
[4]  
[Anonymous], 2011, INT C MACHINE LEARNI
[5]  
[Anonymous], 2013, SDM
[6]  
[Anonymous], 1980, IDENTIFICATION OUTLI
[7]  
[Anonymous], 2015, P 24 ACM INT C INF K
[8]  
[Anonymous], 2013, PMLR
[9]  
[Anonymous], 2011, P 20 ACM INT C INF K
[10]  
[Anonymous], 2011, Acm T. Intel. Syst. Tec., DOI DOI 10.1145/1961189.1961199