Outlier-robust multi-view clustering for uncertain data

被引:46
作者
Sharma, Krishna Kumar [1 ,2 ]
Seal, Ayan [1 ]
机构
[1] PDPM Indian Inst Informat Technol Design & Mfg Ja, Dept Comp Sci & Engn, Jabalpur 482005, Madhya Pradesh, India
[2] Univ Kota, Dept Comp Sci & Informat, Kota 324005, Rajasthan, India
关键词
Multi-view clustering; Uncertain data; Density estimation; k-medoids; S-divergence; MODELS;
D O I
10.1016/j.knosys.2020.106567
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays, multi-view clustering is drawn more and more attention in the area of machine learning because real-world datasets frequently consist of multiple views. Moreover, it provides complementary and consensus information across multiple views. So, owing to the efficacy of revealing the concealed patterns in uncertain data, multiple views are considered in this study. But, a multi-view clustering algorithm is not alone sufficient to increase accuracy. A similarity measure is equally important in uncertain data clustering. However, existing similarity functions for clustering uncertain data afflict with several problems. Geometric distance-based similarity function cannot correctly capture the change between uncertain data with their distributions when they are massively location-wise overlapped. On the other hand, the divergence-based similarity function cannot discriminate against the change between various duos of absolutely disjointed uncertain data. Thus, a self-adaptive mixture similarity function based on geometric distance and S-divergence is introduced for uncertain data clustering. The proposed similarity function is integrated with k-medoids based multi-view clustering. The proposed method reduces the effect of outliers and noises since it uses the threshold-based residual objective function in k-medoids. Finally, extensive experimental results on synthetic and real-world uncertain datasets illustrate that the proposed method consistently defeats the state-of-the-art clustering algorithms. Experimental results also demonstrate the effectiveness and robustness of the proposed method against noise and outliers. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:14
相关论文
共 63 条
[1]   A Survey of Uncertain Data Algorithms and Applications [J].
Aggarwal, Charu C. ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (05) :609-623
[2]  
[Anonymous], 2011, INT C MACHINE LEARNI
[3]  
[Anonymous], 2013, PROC 23 INT JOINT C
[4]   Generalized union and project operations for pooling uncertain and imprecise information [J].
Bell, DA ;
Guan, JW ;
Lee, SK .
DATA & KNOWLEDGE ENGINEERING, 1996, 18 (02) :89-117
[5]   Correlational spectral clustering [J].
Blaschko, Matthew B. ;
Lampert, Christoph H. .
2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, :93-+
[6]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[7]  
Chao G., 2017, ARXIV171206246
[8]  
Chau M, 2006, LECT NOTES ARTIF INT, V3918, P199
[9]  
Cormode G., 2008, Proceedings of the 27th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, P191, DOI DOI 10.1145/1376916.1376944
[10]   Representing uncertain data: models, properties, and algorithms [J].
Das Sarma, Anish ;
Benjelloun, Omar ;
Halevy, Alon ;
Nabar, Shubha ;
Widom, Jennifer .
VLDB JOURNAL, 2009, 18 (05) :989-1019