Hierarchical Clustering via Penalty-Based Aggregation and the Genie Approach

被引:4
作者
Gagolewski, Marek [1 ,2 ]
Cena, Anna [1 ]
Bartoszuk, Maciej [2 ]
机构
[1] Polish Acad Sci, Syst Res Inst, ul Newelska 6, PL-01447 Warsaw, Poland
[2] Warsaw Univ Technol, Fac Math & Informat Sci, ul Koszykowa 75, PL-00662 Warsaw, Poland
来源
MODELING DECISIONS FOR ARTIFICIAL INTELLIGENCE, (MDAI 2016) | 2016年 / 9880卷
关键词
Hierarchical clustering; Aggregation; Centroid; Gini-index; Genie algorithm;
D O I
10.1007/978-3-319-45656-0_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The paper discusses a generalization of the nearest centroid hierarchical clustering algorithm. A first extension deals with the incorporation of generic distance-based penalty minimizers instead of the classical aggregation by means of centroids. Due to that the presented algorithm can be applied in spaces equipped with an arbitrary dissimilarity measure (images, DNA sequences, etc.). Secondly, a correction preventing the formation of clusters of too highly unbalanced sizes is applied: just like in the recently introduced Genie approach, which extends the single linkage scheme, the new method averts a chosen inequity measure (e.g., the Gini-,de Vergottini-,or Bonferroni-index) of cluster sizes from raising above a predefined threshold. Numerous benchmarks indicate that the introduction of such a correction increases the quality of the resulting clusterings significantly.
引用
收藏
页码:191 / 202
页数:12
相关论文
共 15 条
[1]  
Anderberg M.R., 1973, Probability and Mathematical Statistics, DOI DOI 10.1016/C2013-0-06161-0
[2]  
[Anonymous], 2016, PRACTICAL GUIDE AVER
[3]  
[Anonymous], 2016, R LANGUAGE ENV STAT
[4]   Classical inequality indices, welfare and illfare functions, and the dual decomposition [J].
Aristondo, Oihana ;
Luis Garcia-Lapresta, Jose ;
Lasso de la Vega, Casilda ;
Pereira, Ricardo Alberto Marques .
FUZZY SETS AND SYSTEMS, 2013, 228 :114-136
[5]  
Bortot S, 2015, ADV INTEL SYS RES, V89, P333
[6]   Fuzzy K-Minpen Clustering and K-nearest-minpen Classification Procedures Incorporating Generic Distance-Based Penalty Minimizers [J].
Cena, Anna ;
Gagolewski, Marek .
INFORMATION PROCESSING AND MANAGEMENT OF UNCERTAINTY IN KNOWLEDGE-BASED SYSTEMS, IPMU 2016, PT II, 2016, 611 :445-456
[7]  
Deza M. M., 2013, ENCY DISTANCES
[8]  
Gagolewski M., 2015, Data Fusion: Theory, Methods, and Applications
[9]   Genie: A new, fast, and outlier-resistant hierarchical clustering algorithm [J].
Gagolewski, Marek ;
Bartoszuk, Maciej ;
Cena, Anna .
INFORMATION SCIENCES, 2016, 363 :8-23
[10]  
García-Lapresta JL, 2015, ADV INTEL SYS RES, V89, P1140