Clustering Ensemble Based on Sample’s Certainty

被引:0
作者
Xia Ji
Shuaishuai Liu
Peng Zhao
Xuejun Li
Qiong Liu
机构
[1] Anhui University,School of Computer Science and Technology
[2] Science and Technology Information Office,undefined
[3] Department of Public Security of Anhui Province,undefined
来源
Cognitive Computation | 2021年 / 13卷
关键词
Clustering ensemble; Sample’s certainty; Base partition; Co-association matrix;
D O I
暂无
中图分类号
学科分类号
摘要
The objective of clustering ensemble is to fuse multiple base partitions (BPs) to find the underlying data structure. It has been observed that sample can change its neighbors in different BPs and different samples have different relationship stability of sample. This difference shows that samples may have different contributions to the detection of underlying data structure. In addition, clustering ensemble aims to integrate the inconsistent parts of BPs by initially extracting the consistent parts. However, the existing clustering ensemble methods treat all samples equally. They neither consider sample relationship stability nor whether sample belongs to the consistent result or the inconsistent result in BPs. To tackle these deficiencies, we introduce the certainty of a sample to qualify its neighbor relationship stability and propose a formula to calculate this certainty. Then, we develop a clustering ensemble algorithm based on the sample’s certainty. It is based on the following idea: the neighbor relationship of cluster core in BPs is more stable, and different cluster cores usually do not form neighbor relationships in BPs. This idea forms the basis of the clustering ensemble process. According to the sample’s certainty, this algorithm divides a dataset into two subsets: cluster core samples and cluster halo samples. Then, the proposed algorithm discovers a clear core structure using cluster core samples and gradually assigns cluster halo samples to the core structure. The experiments on six synthetic datasets illustrate how our algorithm works. This algorithm has excellent performance and outperforms twelve state-of-the-art clustering ensemble algorithms on twelve real datasets.
引用
收藏
页码:1034 / 1046
页数:12
相关论文
共 77 条
  • [1] Verma M(2012)A comparative study of various clustering algorithms in data mining Int J Eng Res Appl (IJERA) 2 1379-1384
  • [2] Srivastava M(2017)A fast clustering algorithm for high-dimensional data International Journal Of Civil Engineering And Technology (Ijciet) 8 1220-1227
  • [3] Chack N(2019)Robust graph learning from noisy data IEEE transactions on cybernetics 50 1833-1843
  • [4] Diswar AK(2017)Multigranulation information fusion: a Dempster-Shafer evidence theory-based clustering ensemble method Inf Sci 378 389-409
  • [5] Gupta N(2008)Ensemble clustering with voting active clusters Pattern Recogn Lett 29 1947-1953
  • [6] Elankavi R(2008)Cluster ensemble selection Statistical Analysis and Data Mining: The ASA Data Science Journal 1 128-141
  • [7] Kalaiprasath R(2006)Evaluation of stability of k-means cluster ensembles with respect to random initialization IEEE Trans Pattern Anal Mach Intell 28 1798-1808
  • [8] Udayakumar DR(2009)Weighted cluster ensembles: methods and analysis ACM Transactions on Knowledge Discovery from Data (TKDD) 2 1-40
  • [9] Kang Z(2014)Clustering by fast search and find of density peaks Science 344 1492-1496
  • [10] Pan H(2019)Clustering ensemble based on sample's stability Artif Intell 273 37-55