Semi-supervised hybrid clustering by integrating Gaussian mixture model and distance metric learning

被引:0
作者
Yihao Zhang
Junhao Wen
Xibin Wang
Zhuo Jiang
机构
[1] Chongqing University,College of Computer Science
[2] Chongqing University,College of Software Engineering
来源
Journal of Intelligent Information Systems | 2015年 / 45卷
关键词
Semi-supervised clustering; Gaussian mixture model; Distance metric learning; Expectation maximization;
D O I
暂无
中图分类号
学科分类号
摘要
Semi-supervised clustering aim to aid and bias the unsupervised clustering by employing a small amount of supervised information. The supervised information is generally given as pairwise constraints, which was used to either modify the objective function or to learn the distance measure. Many previous work have shown that the cluster algorithm based on distance metric is significantly better than the cluster algorithm based on probability distribution in the some data set, there are a totally opposite result in another data set, so how to balance the two methods become a key problem. In this paper, we proposed a semi-supervised hybrid clustering algorithm that provides a principled framework integrating distance metric into Gaussian mixture model, which consider not only the intrinsic geometry information but also the probability distribution information of the data. In comparison to only using the pairwise constraints, the labeled data was used to initialize Gaussian distribution parameter and to construct the weight matrix of regularizer, and then we adopt Kullback-Leibler Divergence as the “distance” measurement to regularize the objective function. Experiments on several UCI data sets and the real world data sets of Chinese Word Sense Induction demonstrate the effectiveness of our semi-supervised cluster algorithm.
引用
收藏
页码:113 / 130
页数:17
相关论文
共 60 条
  • [1] Belkin M(2006)Manifold regularization: a geometric framework for learning from labeled and unlabeled examples [J] Journal of Machine Learning Research 7 2399-2434
  • [2] Niyogi P(2006)Storing and retrieving Xpath fragments in structured P2P networks [J] Data & Knowledge Engineering 59 247-269
  • [3] Sindhwani V(2010)Locally consistent concept factorization for document clustering [J] IEEE Transactions on Knowledge and Data Engineering 23 902-913
  • [4] Bonifati A(2013)A novel approach for distance-based semi-supervised clustering using functional link neural network [J] Soft Computing 17 369-379
  • [5] Cuzzocrea A(2012)Semi-supervised clustering with discriminative random fields [J] Pattern Recognition 45 4402-4413
  • [6] Cai D(2012)Semi-supervised maximum margin clustering with pairwise constraints [J] IEEE Transactions on Knowledge and Data Engineering 24 926-939
  • [7] He XF(2013)Clustering interval data through kernel-induced feature space [J] Journal of Intelligent Information Systems 40 109-140
  • [8] Han JW(1997)Maximum likelihood from incomplete data via the EM algorithm [J] Journal of the Royal Statistical Society, Series B 39 1-38
  • [9] Chandra B(2002)Unsupervised learning of finite mixture models [J] IEEE Transactions on Pattern Analysis and Machine Intelligence 24 381-396
  • [10] Gupta M(2011)Laplacian regularized Gaussian mixture model for data clustering [J] IEEE Transactions on Knowledge and Data Engineering 23 1406-1418