Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

被引:0
作者
Li C. [1 ]
Rana S. [1 ]
Phung D. [1 ]
Venkatesh S. [1 ]
机构
[1] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong
关键词
Bayesian nonparametric; Constrained clustering; Dirichlet process; Mixture models; Pairwise constraints; Short-text clustering;
D O I
10.1007/s40745-016-0082-z
中图分类号
学科分类号
摘要
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 223
页数:18
相关论文
共 50 条
  • [41] Online damage detection of cutting tools using Dirichlet process mixture models?
    Wickramarachchi, Chandula T.
    Rogers, Timothy J.
    McLeay, Thomas E.
    Leahy, Wayne
    Cross, Elizabeth J.
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 180
  • [42] Distributed Collapsed Gibbs Sampler for Dirichlet Process Mixture Models in Federated Learning
    Khoufache, Reda
    Lebbah, Mustapha
    Azzag, Hanene
    Goffinet, Etienne
    Bouchaffra, Djamel
    PROCEEDINGS OF THE 2024 SIAM INTERNATIONAL CONFERENCE ON DATA MINING, SDM, 2024, : 815 - 823
  • [43] Information value in nonparametric Dirichlet-process Gaussian-process (DPGP) mixture models
    Wei, Hongchuan
    Lu, Wenjie
    Zhu, Pingping
    Ferrari, Silvia
    Liu, Miao
    Klein, Robert H.
    Omidshafiei, Shayegan
    How, Jonathan P.
    AUTOMATICA, 2016, 74 : 360 - 368
  • [44] Sequential Regression Models with Pairwise Constraints Using Noise Clusters
    Tang, Hengjin
    Miyamoto, Sadaaki
    JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS, 2012, 16 (07) : 814 - 818
  • [45] Nonparametric Localized Feature Selection via a Dirichlet Process Mixture of Generalized Dirichlet Distributions
    Fan, Wentao
    Bouguila, Nizar
    NEURAL INFORMATION PROCESSING, ICONIP 2012, PT III, 2012, 7665 : 25 - 33
  • [46] DYNAMIC TEXTURES CLUSTERING USING A HIERARCHICAL PITMAN-YOR PROCESS MIXTURE OF DIRICHLET DISTRIBUTIONS
    Fan, Wentao
    Bouguila, Nizar
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 296 - 300
  • [47] Nonparametric Bayesian inferences on the skewed data using a Dirichlet process mixture model
    Mostofi, Amin Ghalamfarsa
    Kharrati-Kopaei, Mahmood
    STATISTICAL PAPERS, 2025, 66 (01)
  • [48] Dependent mixture models: Clustering and borrowing information
    Lijoi, Antonio
    Nipoti, Bernardo
    Prunster, Igor
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 71 : 417 - 433
  • [49] KIHCDP: An Incremental Hierarchical Clustering Approach for IoT Data Using Dirichlet Process
    Chowdhury, Abishi
    Pal, Amrit
    Raut, Ashwin
    Kumar, Manish
    IEEE ACCESS, 2024, 12 : 56019 - 56032
  • [50] Simultaneous clustering and feature selection via nonparametric Pitman-Yor process mixture models
    Fan, Wentao
    Bouguila, Nizar
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (10) : 2753 - 2766