Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

被引:0
作者
Li C. [1 ]
Rana S. [1 ]
Phung D. [1 ]
Venkatesh S. [1 ]
机构
[1] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong
关键词
Bayesian nonparametric; Constrained clustering; Dirichlet process; Mixture models; Pairwise constraints; Short-text clustering;
D O I
10.1007/s40745-016-0082-z
中图分类号
学科分类号
摘要
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 223
页数:18
相关论文
共 50 条
  • [1] Axially Symmetric Data Clustering Through Dirichlet Process Mixture Models of Watson Distributions
    Fan, Wentao
    Bouguila, Nizar
    Du, Ji-Xiang
    Liu, Xin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (06) : 1683 - 1694
  • [2] Quantum annealing for Dirichlet process mixture models with applications to network clustering
    Sato, Issei
    Tanaka, Shu
    Kurihara, Kenichi
    Miyashita, Seiji
    Nakagawa, Hiroshi
    NEUROCOMPUTING, 2013, 121 : 523 - 531
  • [3] Quantum annealing for Dirichlet process mixture models with applications to network clustering
    Sato, Issei
    Tanaka, Shu
    Kurihara, Kenichi
    Miyashita, Seiji
    Nakagawa, Hiroshi
    Neurocomputing, 2013, 121 : 523 - 531
  • [4] Online Data Clustering Using Variational Learning of a Hierarchical Dirichlet Process Mixture of Dirichlet Distributions
    Fan, Wentao
    Bouguila, Nizar
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS, DASFAA 2014, 2014, 8505 : 18 - 32
  • [5] Clustering with label constrained Dirichlet process mixture model
    Burhanuddin, Nurul Afiqah
    Adam, Mohd Bakri
    Ibrahim, Kamarulzaman
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 107
  • [6] A deep clustering framework integrating pairwise constraints and a VMF mixture model
    Ma, He
    Wu, Weipeng
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (06): : 3952 - 3972
  • [7] Selecting the precision parameter prior in Dirichlet process mixture models
    Murugiah, Siva
    Sweeting, Trevor
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2012, 142 (07) : 1947 - 1959
  • [8] On selecting a prior for the precision parameter of Dirichlet process mixture models
    Dorazio, Robert M.
    JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2009, 139 (09) : 3384 - 3390
  • [9] A Predictive Study of Dirichlet Process Mixture Models for Curve Fitting
    Wade, Sara
    Walker, Stephen G.
    Petrone, Sonia
    SCANDINAVIAN JOURNAL OF STATISTICS, 2014, 41 (03) : 580 - 605
  • [10] DIRICHLET PROCESS MIXTURE MODELS WITH MULTIPLE MODALITIES
    Paisley, John
    Carin, Lawrence
    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS, 2009, : 1613 - 1616