An Efficient Grid-based Clustering Method by Finding Density Peaks

被引：0

作者：

Wu, Bo ^{[1
]}

Wilamowski, B. M. ^{[1
]}

机构：

[1] Auburn Univ, Dept Elect & Comp Engn, Auburn, AL 36849 USA

来源：

PROCEEDINGS OF THE IECON 2016 - 42ND ANNUAL CONFERENCE OF THE IEEE INDUSTRIAL ELECTRONICS SOCIETY | 2016年

关键词：

clustering; grid; density peaks; efficiency; SEARCH;

D O I：

暂无

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Clustering or categorizing an unprocessed data set is essential and critical in many areas. Much success has been published, which first needs to calculate the mutual distances between data points. It suffers from considerable computational costs, preventing the state-of-the-art methods such as the clustering method by fast search and find of density peaks (FSFDP, published in Science, 2014) from applying into real life (e.g., with thousands of data points). In this paper, an efficient grid-based clustering (GBC) method by finding density peaks is described. It keeps the advantage of the friendly interactive interface in the FSFDP, at the mean time, decreases enormously the computation complexity. The time complexity of the FSFDP is o(np(np 1)/2) while our method decreases it to o(np * sizeof (grid)), where np is the number of data points and the size of grid is always much smaller than np so that the time complexity of our approach is almost linearly proportional to np. The presented GBC method by finding density peaks was able to calculate the densities and categorize datasets within much less time, which makes the density-peak-based algorithm practical. By using the presented algorithm, it was possible to cluster high dimensional data sets as well. The GBC method by finding density peaks was successfully verified in clustering several datasets, which are commonly used to test clustering algorithms in published articles. It turned out that the presented method is much faster and efficient in clustering datasets into different categories than the conventional density-based ones, which makes the proposed method more preferable.

引用

页码：837 / 842

页数：6

共 15 条

[1]

Chui KT, 2013, IEEE IND ELEC, P8420, DOI 10.1109/IECON.2013.6700545

[2] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].

DEMPSTER, AP ;

LAIRD, NM ;

RUBIN, DB .

JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38

[3]

Dunn J. C., 1973, Journal of Cybernetics, V3, P32, DOI 10.1080/01969727308546046

[4]

Ester M, 1996, P 2 INT C KNOWLEDGE, DOI DOI 10.5555/3001460.3001507

[5] Iterative shrinking method for clustering problems [J].

Fränti, P ;

Virmajoki, I .

PATTERN RECOGNITION, 2006, 39 (05) :761-775

[6] FLAME, a novel fuzzy clustering method for the analysis of DNA microarray data [J].

Fu, Limin ;

Medico, Enzo .

BMC BIOINFORMATICS, 2007, 8

[7]

Hernández DC, 2014, PROC IEEE INT SYMP, P1967, DOI 10.1109/ISIE.2014.6864917

[8] Real-Time Implementation of a Harmony Search Algorithm-Based Clustering Protocol for Energy-Efficient Wireless Sensor Networks [J].

Hoang, Duc Chinh ;

Yadav, Parikshit ;

Kumar, Rajesh ;

Panda, Sanjib Kumar .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2014, 10 (01) :774-783

[9]

Jain A. K., 1988, Algorithms for Clustering Data

[10] HIERARCHICAL CLUSTERING SCHEMES [J].

JOHNSON, SC .

PSYCHOMETRIKA, 1967, 32 (03) :241-254

← 1 2 →