Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

被引:0
作者
Li C. [1 ]
Rana S. [1 ]
Phung D. [1 ]
Venkatesh S. [1 ]
机构
[1] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong
关键词
Bayesian nonparametric; Constrained clustering; Dirichlet process; Mixture models; Pairwise constraints; Short-text clustering;
D O I
10.1007/s40745-016-0082-z
中图分类号
学科分类号
摘要
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 223
页数:18
相关论文
共 50 条
  • [11] Hybrid Dirichlet mixture models for functional data
    Petrone, Sonia
    Guindani, Michele
    Gelfand, Alan E.
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2009, 71 : 755 - 782
  • [12] Partially collapsed parallel Gibbs sampler for Dirichlet process mixture models
    Yerebakan, Halid Ziya
    Dundar, Murat
    PATTERN RECOGNITION LETTERS, 2017, 90 : 22 - 27
  • [13] Expectation propagation learning of a Dirichlet process mixture of Beta-Liouville distributions for proportional data clustering
    Fan, Wentao
    Bouguila, Nizar
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2015, 43 : 1 - 14
  • [14] Positive vectors clustering using inverted Dirichlet finite mixture models
    Bdiri, Taoufik
    Bouguila, Nizar
    EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (02) : 1869 - 1882
  • [15] A tutorial on Dirichlet process mixture modeling
    Li, Yuelin
    Schofield, Elizabeth
    Gonen, Mithat
    JOURNAL OF MATHEMATICAL PSYCHOLOGY, 2019, 91 : 128 - 144
  • [16] Affinity Propagation Clustering With Pairwise Constraints
    Zhang, Lijia
    Cheng, Lianglun
    PROCEEDINGS OF THE 2016 2ND WORKSHOP ON ADVANCED RESEARCH AND TECHNOLOGY IN INDUSTRY APPLICATIONS, 2016, 81 : 527 - 531
  • [17] Face Clustering: Representation and Pairwise Constraints
    Shi, Yichun
    Otto, Charles
    Jain, Anil K.
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2018, 13 (07) : 1626 - 1640
  • [18] Boolean Kernels and Clustering with Pairwise Constraints
    Kusunoki, Yoshifumi
    Tanino, Tetsuzo
    2014 IEEE INTERNATIONAL CONFERENCE ON GRANULAR COMPUTING (GRC), 2014, : 141 - 146
  • [19] BoostCluster: Boosting Clustering by Pairwise Constraints
    Liu, Yi
    Jin, Rong
    Jain, Anil K.
    KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2007, : 450 - 459
  • [20] Dirichlet process HMM mixture models with application to music analysis
    Qi, Yuting
    Paisley, John William
    Carin, Lawrence
    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL II, PTS 1-3, 2007, : 465 - +