A quality driven Hierarchical Data Divisive Soft Clustering for information retrieval

被引:27
作者
Bordogna, Gloria [2 ]
Pasi, Gabriella [1 ]
机构
[1] Univ Milano Bicocca DISCo, I-20126 Milan, Italy
[2] CNR Natl Res Council IDPA, Dalmine, BG, Italy
关键词
Soft Hierarchical Clustering; Fuzzy C-Means; Cluster's quality; Document clustering; Quality measures; FUZZY; ALGORITHMS; VALIDITY;
D O I
10.1016/j.knosys.2011.06.012
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper an adaptive hierarchical fuzzy clustering algorithm is presented, named Hierarchical Data Divisive Soft Clustering (H2D-SC). The main novelty of the proposed algorithm is that it is a quality driven algorithm, since it dynamically evaluates a multi-dimensional quality measure of the clusters to drive the generation of the soft hierarchy. Specifically, it generates a hierarchy in which each node is split into a variable number of sub-nodes, determined by an innovative quality assessment of soft clusters, based on the evaluation of multiple dimensions such as the cluster's cohesion, its cardinality, its mass, and its fuzziness, as well as the partition's entropy. Clusters at the same hierarchical level share a minimum quality value: clusters in the lower levels of the hierarchy have a higher quality: this way more specific clusters (lower level clusters) have a higher quality than more general clusters (upper level clusters). Further, since the algorithm generates a soft partition, a document can belong to several sub-clusters with distinct membership degrees. The proposed algorithm is divisive, and it is based on a combination of a modified bisecting K-Means algorithm with a flat soft clustering algorithm used to partition each node. The paper describes the algorithm and its evaluation on two standard collections. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:9 / 19
页数:11
相关论文
共 55 条
[1]  
[Anonymous], 1984, Introduction to Modern Information Retrieval
[2]  
[Anonymous], 1998, TECHNICAL REPORT WS
[3]  
[Anonymous], 2003, P ACM S APPL COMP
[4]  
[Anonymous], 2009, COP 2009 CLIM C COP
[5]  
[Anonymous], Pattern Recognition with Fuzzy Objective Function Algorithms
[6]  
BACK C, 1995, DATA ANAL INFORM SYS, P114
[7]  
Baeza-Yates R, 1999, MODERN INFORM RETRIE, V463
[8]   Principal direction divisive partitioning [J].
Boley, D .
DATA MINING AND KNOWLEDGE DISCOVERY, 1998, 2 (04) :325-344
[9]  
Bordogna G., 2008, UNCERTAINTY INTELLIG
[10]  
Bordogna G., 2009, P IEEE WIC ACM INT J