An Adaptive Initial Cluster Centers Selection Algorithm for High-dimensional Partition Clustering

被引:2
作者
Gao, Zhipeng [1 ]
Fan, Yidan [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing, Peoples R China
来源
2017 IEEE 15TH INTL CONF ON DEPENDABLE, AUTONOMIC AND SECURE COMPUTING, 15TH INTL CONF ON PERVASIVE INTELLIGENCE AND COMPUTING, 3RD INTL CONF ON BIG DATA INTELLIGENCE AND COMPUTING AND CYBER SCIENCE AND TECHNOLOGY CONGRESS(DASC/PICOM/DATACOM/CYBERSCI | 2017年
关键词
high-dimensional partition clustering; similarity measure; density measure; initial cluster centers; outliers;
D O I
10.1109/DASC-PICom-DataCom-CyberSciTec.2017.181
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Cluster analysis is the process of partitioning a set of data objects into subsets, each subset is a cluster, so that objects within a cluster have high similarity, but are very dissimilar to objects in other clusters. Partitioning methods in clustering start from an initial partitioning and gain the optimal partition by applying the iterative relocation technique. Partition clustering results depend heavily on the selection of initial cluster centers. Traditional distance-based initialization methods become inefficient because of the inherent sparsity in high-dimensional data and the curse of dimensionality, while existing improved methods are very sensitive to parameters. Based on these, we propose a new initialization method for high-dimensional partition clustering, which can choose high-density and low-similarity initial cluster centers and identify outliers according to its local structure in high-dimensional space adaptively. The experiments on both synthetic and real-world datasets show that the proposed algorithm can achieve better performance.
引用
收藏
页码:1119 / 1126
页数:8
相关论文
共 35 条
[1]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2]  
Aggarwal CC, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P61, DOI 10.1145/304181.304188
[3]  
Andoni A, 2006, ANN IEEE SYMP FOUND, P459
[4]  
[Anonymous], 2007, P 33 INT C VER LARG
[5]  
[Anonymous], 2012, Theory of computing, DOI DOI 10.4086/TOC.2012.V008A014
[6]  
[Anonymous], 2004, SIGKDD EXPLOR, DOI DOI 10.1145/1007730.1007731
[7]  
Bawa M, 2005, WWW 05, DOI [DOI 10.1145/1060745.1060840, 10.1145/1060745.1060840]
[8]  
Bronshtein I.N., 2013, HDB MATH
[9]   An Initialization Method for Clustering High-Dimensional Data [J].
Chen, Luying ;
Chen, Lifei ;
Jiang, Qingshan ;
Wang, Beizhan ;
Shi, Liang .
FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, :444-+
[10]   Study on density peaks clustering based on k-nearest neighbors and principal component analysis [J].
Du, Mingjing ;
Ding, Shifei ;
Jia, Hongjie .
KNOWLEDGE-BASED SYSTEMS, 2016, 99 :135-145