Research on dirichlet process mixture model for clustering

被引:0
作者
Zhang B. [1 ]
Zhang K. [1 ,2 ]
Zhong L. [1 ]
Zhang X. [3 ]
机构
[1] School of Computer Science and Technology, Wuhan University of Technology, Wuhan
[2] School of Mechanical Engineering, Wuhan Polytechnic University, Wuhan
[3] Department of Electronic Information Engineering, Wuhan City Vocational College, Wuhan
来源
Ingenierie des Systemes d'Information | 2019年 / 24卷 / 02期
基金
中国国家自然科学基金;
关键词
Clustering; DPMM; Hierarchical DPMM; Nonparametric Bayesian;
D O I
10.18280/isi.240209
中图分类号
学科分类号
摘要
This paper aims to develop a clustering method that need not predefine the number of clusters or incur a high computing cost. For this purpose, Dirichlet Process Mixture Model (DPMM) which based on nonparametric Bayesian method was introduced. Three datasets, from simple to complex, were selected for experiment. The results of the first two datasets showed that the DPMM is highly flexible and reliable, because it did not need to know the number of clusters in advance and had robustness for different rational parameters. However, the DPMM failed to achieve desirable results in the third dataset. To overcome the limitation of one-Time DPMM clustering on complex datasets, the notion of hierarchical clustering was adopted to form the hierarchical DPMM algorithm, which outputted better clustering results than DPMM. In this paper, the rules of selecting parameters and the algorithm of hierarchical DPMM are provided for the effective using of DPMM. © 2019 International Information and Engineering Technology Association. All rights reserved.
引用
收藏
页码:183 / 189
页数:6
相关论文
共 22 条
  • [1] Cui X.L., Zhu P.F., Yang X., Li K.Q., Ji C.Q., Optimized big data K-means clustering using MapReduce, Journal of Supercomputing, 70, 3, pp. 1249-1259, (2014)
  • [2] Jain A.K., Data clustering: 50 years beyond K-means, Lecture Notes in Computer Science, 5211, pp. 3-4, (2010)
  • [3] Rodriguez A., Laio A., Clustering by fast search and find of density peaks, Science, 344, 6191, pp. 1492-1496, (2014)
  • [4] Bai L., Chen X.Q., Liang J.Y., Shen H.W., Guo Y.K., Fast density clustering strategies based on the K-means algorithm, Pattern Recognition, 71, pp. 375-386, (2017)
  • [5] Nanda S.J., Panda G., Design of computationally efficient density-based clustering algorithms, Data & Knowledge Engineering, 95, pp. 23-38, (2015)
  • [6] Antoniak C.E., Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems, Annals of Statistics, 2, 6, pp. 1152-1174, (1974)
  • [7] Escobar M.D., West M., Bayesian density estimation and inference using mixtures, Journal of the American Statistical Association, 90, 430, pp. 577-588, (1995)
  • [8] Shahbaba B., Neal R., Nonlinear models using Dirichlet process mixtures, Journal of Machine Learning Research, 10, pp. 1829-1850, (2009)
  • [9] Ishwaran H., Zarepour M., Markov chain monte Carlo in approximate Dirichlet and beta two-parameter process hierarchical models, Biometrika, 87, 2, pp. 371-390, (2000)
  • [10] Ishwaran H., James L.F., Generalized weighted Chinese restaurant processes for species sampling mixture models, Statistica Sinica, 13, 4, pp. 1211-1235, (2003)