Using instance-level constraints in agglomerative hierarchical clustering: theoretical and empirical results

被引:0
作者
Ian Davidson
S. S. Ravi
机构
[1] The University of California - Davis,Department of Computer Science
[2] University at Albany - State University of New York,Department of Computer Science
来源
Data Mining and Knowledge Discovery | 2009年 / 18卷
关键词
Clustering; Constrained clustering; Semi-supervised learning;
D O I
暂无
中图分类号
学科分类号
摘要
Clustering with constraints is a powerful method that allows users to specify background knowledge and the expected cluster properties. Significant work has explored the incorporation of instance-level constraints into non-hierarchical clustering but not into hierarchical clustering algorithms. In this paper we present a formal complexity analysis of the problem and show that constraints can be used to not only improve the quality of the resultant dendrogram but also the efficiency of the algorithms. This is particularly important since many agglomerative style algorithms have running times that are quadratic (or faster growing) functions of the number of instances to be clustered. We present several bounds on the improvement in the running times of algorithms obtainable using constraints.
引用
收藏
页码:257 / 282
页数:25
相关论文
共 6 条
  • [1] Davidson I(2007)The complexity of non-hierarchical clustering with instance and cluster level constraints Data Min Know Disc 14 25-61
  • [2] Ravi SS(2007)A natural agglomerative clustering method for biology Biometrical J 33 841-849
  • [3] Dragomirescu L(2005)Hierarchical clustering algorithms for document datasets Data Min Know Disc 10 141-168
  • [4] Postelnicu T(undefined)undefined undefined undefined undefined-undefined
  • [5] Zho Y(undefined)undefined undefined undefined undefined-undefined
  • [6] Karypis G(undefined)undefined undefined undefined undefined-undefined