Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

被引:0
作者
Li C. [1 ]
Rana S. [1 ]
Phung D. [1 ]
Venkatesh S. [1 ]
机构
[1] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong
关键词
Bayesian nonparametric; Constrained clustering; Dirichlet process; Mixture models; Pairwise constraints; Short-text clustering;
D O I
10.1007/s40745-016-0082-z
中图分类号
学科分类号
摘要
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 223
页数:18
相关论文
共 50 条
  • [31] Enhancing Cluster Accuracy in Diabetes Multimorbidity With Dirichlet Process Mixture Models
    Kita, Francis John
    Gaddes, Srinivasa Rao
    Kirigiti, Peter Josephat
    IEEE ACCESS, 2025, 13 : 6422 - 6439
  • [32] Variational Textured Dirichlet Process Mixture Model With Pairwise Constraint for Unsupervised Classification of Polarimetric SAR Images
    Liu, Chi
    Li, Heng-Chao
    Liao, Wenzhi
    Philips, Wilfried
    Emery, William J.
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2019, 28 (08) : 4145 - 4160
  • [33] Clustering Spatial Data with a Mixture of Skewed Regression Models
    Lee, Junho
    Gallaugher, Michael P. B.
    Hering, Amanda S.
    TECHNOMETRICS, 2025,
  • [34] Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
    Hastie, David I.
    Liverani, Silvia
    Richardson, Sylvia
    STATISTICS AND COMPUTING, 2015, 25 (05) : 1023 - 1037
  • [35] Sampling from Dirichlet process mixture models with unknown concentration parameter: mixing issues in large data implementations
    David I. Hastie
    Silvia Liverani
    Sylvia Richardson
    Statistics and Computing, 2015, 25 : 1023 - 1037
  • [36] A HIERARCHICAL DIRICHLET PROCESS MIXTURE MODEL FOR HAPLOTYPE RECONSTRUCTION FROM MULTI-POPULATION DATA
    Sohn, Kyung-Ah
    Xing, Eric P.
    ANNALS OF APPLIED STATISTICS, 2009, 3 (02) : 791 - 821
  • [37] Comparing Clustering with Pairwise and Relative Constraints: A Unified Framework
    Pei, Yuanli
    Fern, Xiaoli Z.
    Tjahja, Teresa Vania
    Rosales, Romer
    ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2016, 11 (02)
  • [38] Active Image Clustering with Pairwise Constraints from Humans
    Biswas, Arijit
    Jacobs, David
    INTERNATIONAL JOURNAL OF COMPUTER VISION, 2014, 108 (1-2) : 133 - 147
  • [39] Active Image Clustering with Pairwise Constraints from Humans
    Arijit Biswas
    David Jacobs
    International Journal of Computer Vision, 2014, 108 : 133 - 147
  • [40] Semi-supervised DenPeak Clustering with Pairwise Constraints
    Ren, Yazhou
    Hu, Xiaohui
    Shi, Ke
    Yu, Guoxian
    Yao, Dezhong
    Xu, Zenglin
    PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 837 - 850