Dirichlet Process Mixture Models with Pairwise Constraints for Data Clustering

被引:0
作者
Li C. [1 ]
Rana S. [1 ]
Phung D. [1 ]
Venkatesh S. [1 ]
机构
[1] Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong
关键词
Bayesian nonparametric; Constrained clustering; Dirichlet process; Mixture models; Pairwise constraints; Short-text clustering;
D O I
10.1007/s40745-016-0082-z
中图分类号
学科分类号
摘要
The Dirichlet process mixture (DPM) model, a typical Bayesian nonparametric model, can infer the number of clusters automatically, and thus performing priority in data clustering. This paper investigates the influence of pairwise constraints in the DPM model. The pairwise constraint, known as two types: must-link (ML) and cannot-link (CL) constraints, indicates the relationship between two data points. We have proposed two relevant models which incorporate pairwise constraints: the constrained DPM (C-DPM) and the constrained DPM with selected constraints (SC-DPM). In C-DPM, the concept of chunklet is introduced. ML constraints are compiled into chunklets and CL constraints exist between chunklets. We derive the Gibbs sampling of the C-DPM based on chunklets. We further propose a principled approach to select the most useful constraints, which will be incorporated into the SC-DPM. We evaluate the proposed models based on three real datasets: 20 Newsgroups dataset, NUS-WIDE image dataset and Facebook comments datasets we collected by ourselves. Our SC-DPM performs priority in data clustering. In addition, our SC-DPM can be potentially used for short-text clustering. © 2016, Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:205 / 223
页数:18
相关论文
共 50 条
  • [21] Automated Movement Detection with Dirichlet Process Mixture Models and Electromyography
    Cooray, Navin
    Li, Zhenglin
    Wang, Jinzhuo
    Lo, Christine
    Arvaneh, Mahnaz
    Symmonds, Mkael
    Hu, Michele
    De Vos, Maarten
    Mihaylova, Lyudmila S.
    2022 25TH INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2022), 2022,
  • [22] Dirichlet process mixture models for finding shared structure between two related data sets
    Leen, Gayle
    Fyfe, Colin
    ADVANCES ON ARTIFICIAL INTELLIGENCE, KNOWLEDGE ENGINEERING AND DATA BASES, PROCEEDINGS, 2008, : 31 - +
  • [23] Non-Gaussian Data Clustering via Expectation Propagation Learning of Finite Dirichlet Mixture Models and Applications
    Fan, Wentao
    Bouguila, Nizar
    NEURAL PROCESSING LETTERS, 2014, 39 (02) : 115 - 135
  • [24] Non-Gaussian Data Clustering via Expectation Propagation Learning of Finite Dirichlet Mixture Models and Applications
    Wentao Fan
    Nizar Bouguila
    Neural Processing Letters, 2014, 39 : 115 - 135
  • [25] Tensor Dirichlet Process Multinomial Mixture Model with Graphs for Passenger Trajectory Clustering
    Li, Ziyue
    Yan, Hao
    Zhang, Chen
    Ketter, Wolfgang
    Tsung, Fugee
    PROCEEDINGS OF THE 6TH ACM SIGSPATIAL INTERNATIONAL WORKSHOP ON AI FOR GEOGRAPHIC KNOWLEDGE DISCOVERY, GEOAI 2023, 2023, : 121 - 128
  • [26] Generate pairwise constraints from unlabeled data for semi-supervised clustering
    Masud, Md Abdul
    Huang, Joshua Zhexue
    Zhong, Ming
    Fu, Xianghua
    DATA & KNOWLEDGE ENGINEERING, 2019, 123
  • [27] Constrained Clustering: General Pairwise and Cardinality Constraints
    Bibi, Adel
    Alqahtani, Ali
    Ghanem, Bernard
    IEEE ACCESS, 2023, 11 : 5824 - 5836
  • [28] Improving clustering with pairwise constraints: a discriminative approach
    Hong Zeng
    Aiguo Song
    Yiu Ming Cheung
    Knowledge and Information Systems, 2013, 36 : 489 - 515
  • [29] Improving clustering with pairwise constraints: a discriminative approach
    Zeng, Hong
    Song, Aiguo
    Cheung, Yiu Ming
    KNOWLEDGE AND INFORMATION SYSTEMS, 2013, 36 (02) : 489 - 515
  • [30] Clustering Mixed-Type Data via Dirichlet Process Mixture Model with Cluster-Specific Covariance Matrices
    Burhanuddin, Nurul Afiqah
    Ibrahim, Kamarulzaman
    Zulkafli, Hani Syahida
    Mustapha, Norwati
    SYMMETRY-BASEL, 2024, 16 (06):