The Minkowski central partition as a pointer to a suitable distance exponent and consensus partitioning

被引:15
作者
de Amorima, Renato Cordeiro [1 ]
Shestakov, Andrei [2 ]
Mirkin, Boris [2 ,3 ]
Makarenkov, Vladimir [4 ]
机构
[1] Univ Hertfordshire, Sch Comp Sci, Coll Lane, Hatfield AL10 9AB, Herts, England
[2] Natl Res Univ, Higher Sch Econ, Dept Data Anal & Machine Intelligence, Moscow, Russia
[3] Birkbeck Univ London, Dept Comp Sci & Informat Syst, Malet St, London WC1E 7HX, England
[4] Univ Quebec Montreal, Dept Informat, CP 8888 Succ Ctr Ville, Montreal, PQ H3C 3P8, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Clustering; Central clustering; Feature weighting; Minkowski metric; Minkowski ensemble; DIVERSITY; ALGORITHM; CLUSTERS; NUMBER;
D O I
10.1016/j.patcog.2017.02.001
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Minkowski weighted K-means (MWK-means) is a recently developed clustering algorithm capable of computing feature weights. The cluster-specific weights in MWK-means follow the intuitive idea that a feature with low variance should have a greater weight than a feature with high variance. The final clustering found by this algorithm depends on the selection of the Minkowski distance exponent. This paper explores the possibility of using the central Minkowski partition in the ensemble of all Minkowski partitions for selecting an optimal value of the Minkowski exponent. The central Minkowski partition appears to be also a good consensus partition. Furthermore, we discovered some striking correlation results between the Minkowski profile, defined as a mapping of the Minkowski exponent values into the average similarity values of the optimal Minkowski partitions, and the Adjusted Rand Index vectors resulting from the comparison of the obtained partitions to the ground truth. Our findings were confirmed by a series of computational experiments involving synthetic Gaussian clusters and real-world data. (C) 2017 Elsevier Ltd. All rights reserved.
引用
收藏
页码:62 / 72
页数:11
相关论文
共 38 条
[1]  
[Anonymous], 2010, v7.10.0 (R2010a)
[2]   An extensive comparative study of cluster validity indices [J].
Arbelaitz, Olatz ;
Gurrutxaga, Ibai ;
Muguerza, Javier ;
Perez, Jesus M. ;
Perona, Inigo .
PATTERN RECOGNITION, 2013, 46 (01) :243-256
[3]   Discovering multi-level structures in bio-molecular data through the Bernstein inequality [J].
Bertoni, Alberto ;
Valentini, Giorgio .
BMC BIOINFORMATICS, 2008, 9 (Suppl 2)
[4]  
Calinski T., 1974, Commun StatTheory Methods, V3, P1, DOI DOI 10.1080/03610927408827101
[5]   An optimization algorithm for clustering using weighted dissimilarity measures [J].
Chan, EY ;
Ching, WK ;
Ng, MK ;
Huang, JZ .
PATTERN RECOGNITION, 2004, 37 (05) :943-952
[6]  
de Amorim Renato Cordeiro, 2012, Advances in Intelligent Data Analysis XI. Proceedings 11th International Symposium, IDA 2012, P45, DOI 10.1007/978-3-642-34156-4_6
[7]   A Survey on Feature Weighting Based K-Means Algorithms [J].
de Amorim, Renato Cordeiro .
JOURNAL OF CLASSIFICATION, 2016, 33 (02) :210-242
[8]   Recovering the number of clusters in data sets with noise features using feature rescaling factors [J].
de Amorim, Renato Cordeiro ;
Hennig, Christian .
INFORMATION SCIENCES, 2015, 324 :126-145
[9]   Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering [J].
de Amorim, Renato Cordeiro ;
Mirkin, Boris .
PATTERN RECOGNITION, 2012, 45 (03) :1061-1075
[10]  
Field A., 2012, DISCOVERING STAT USI