A-Wardpβ: Effective hierarchical clustering using the Minkowski metric and a fast k-means initialisation

被引:12
作者
de Amorim, Renato Cordeiro [1 ]
Makarenkov, Vladimir [2 ]
Mirkin, Boris [3 ,4 ]
机构
[1] Univ Hertfordshire, Sch Comp Sci, Coll Lane Campus, Hatfield AL10 9AB, Herts, England
[2] Univ Quebec, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada
[3] Natl Res Univ, Higher Sch Econ, Dept Data Anal & Machine Intelligence, Moscow, Russia
[4] Birkbeck Univ London, Dept Comp Sci & Informat Syst, Malet St, London WC1E 7HX, England
关键词
Initialisation algorithm; Minkowski metric; Hierarchical clustering; Feature weighting; ALGORITHM;
D O I
10.1016/j.ins.2016.07.076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we make two novel contributions to hierarchical clustering. First, we introduce an anomalous pattern initialisation method for hierarchical clustering algorithms, called A-Ward, capable of substantially reducing the time they take to converge. This method generates an initial partition with a sufficiently large number of clusters. This allows the cluster merging process to start from this partition rather than from a trivial partition composed solely of singletons. Our second contribution is an extension of the Ward and Ward(p) algorithms to the situation where the feature weight exponent can differ from the exponent of the Minkowski distance. This new method, called A-Ward(p beta), is able to generate a much wider variety of clustering solutions. We also demonstrate that its parameters can be estimated reasonably well by using a cluster validity index. We perform numerous experiments using data sets with two types of noise, insertion of noise features and blurring within-cluster values of some features. These experiments allow us to conclude: (i) our anomalous pattern initialisation method does indeed reduce the time a hierarchical clustering algorithm takes to complete, without negatively impacting its cluster recovery ability; (ii) A-Ward(p beta) provides better cluster recovery than both Ward and Wardp. (C) 2016 Elsevier Inc. All rights reserved.
引用
收藏
页码:343 / 354
页数:12
相关论文
共 41 条
[1]  
[Anonymous], SOC SCI RES NETW
[2]  
[Anonymous], 2012, MATLAB STAT TOOLB RE
[3]  
[Anonymous], DATA RECOVERY APPROA
[4]  
[Anonymous], KDD WORKSH TEXT MIN
[5]  
[Anonymous], ACM J EXPT ALGORITHM
[6]  
[Anonymous], COMPSTAT LECT VIENNA
[7]  
[Anonymous], R STATS PACKAGE VERS
[8]   An extensive comparative study of cluster validity indices [J].
Arbelaitz, Olatz ;
Gurrutxaga, Ibai ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2013, 46 (01) :243-256
[9]   A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA [J].
BALL, GH ;
HALL, DJ .
BEHAVIORAL SCIENCE, 1967, 12 (02) :153-&
[10]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203