Short text clustering based on Pitman-Yor process mixture model

被引:29
|
作者
Qiang, Jipeng [1 ]
Li, Yun [1 ]
Yuan, Yunhao [1 ]
Wu, Xindong [2 ,3 ]
机构
[1] Yangzhou Univ, Dept Comp Sci, Yangzhou, Jiangsu, Peoples R China
[2] Hefei Univ Technol, Dept Comp Sci, Hefei, Anhui, Peoples R China
[3] Univ Louisiana Lafayette, Sch Comp & Informat, Lafayette, LA 70504 USA
基金
中国国家自然科学基金;
关键词
LDA; Pitman-Yor process; Short text clustering; NONNEGATIVE MATRIX FACTORIZATION; ALGORITHMS;
D O I
10.1007/s10489-017-1055-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For finding the appropriate number of clusters in short text clustering, models based on Dirichlet Multinomial Mixture (DMM) require the maximum possible cluster number before inferring the real number of clusters. However, it is difficult to choose a proper number as we do not know the true number of clusters in short texts beforehand. The cluster distribution in DMM based on Dirichlet process as prior goes down exponentially as the number of clusters increases. Therefore, we propose a novel model based on Pitman-Yor Process to capture the power-law phenomenon of the cluster distribution in the paper. Specifically, each text chooses one of the active clusters or a new cluster with probabilities derived from the Pitman-Yor Process Mixture model (PYPM). Discriminative words and nondiscriminative words are identified automatically to help enhance text clustering. Parameters are estimated efficiently by collapsed Gibbs sampling and experimental results show PYPM is robust and effective comparing with the state-of-the-art models.
引用
收藏
页码:1802 / 1812
页数:11
相关论文
共 50 条
  • [1] Short text clustering based on Pitman-Yor process mixture model
    Jipeng Qiang
    Yun Li
    Yunhao Yuan
    Xindong Wu
    Applied Intelligence, 2018, 48 : 1802 - 1812
  • [2] The Pitman-Yor multinomial process for mixture modelling
    Lijoi, Antonio
    Prunster, Igor
    Rigon, Tommaso
    BIOMETRIKA, 2020, 107 (04) : 891 - 906
  • [3] Simultaneous clustering and feature selection via nonparametric Pitman-Yor process mixture models
    Fan, Wentao
    Bouguila, Nizar
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2019, 10 (10) : 2753 - 2766
  • [4] DYNAMIC TEXTURES CLUSTERING USING A HIERARCHICAL PITMAN-YOR PROCESS MIXTURE OF DIRICHLET DISTRIBUTIONS
    Fan, Wentao
    Bouguila, Nizar
    2015 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2015, : 296 - 300
  • [5] Hierarchical Pitman-Yor and Dirichlet Process for Language Model
    Chien, Jen-Tzung
    Chang, Ying-Lan
    14TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2013), VOLS 1-5, 2013, : 2211 - 2215
  • [6] Stochastic Approximations to the Pitman-Yor Process
    Arbel, Julyan
    De Blasi, Pierpaolo
    Prunster, Igor
    BAYESIAN ANALYSIS, 2019, 14 (04): : 1201 - 1219
  • [7] Background Subtraction with a Hierarchical Pitman-Yor Process Mixture Model of Generalized Gaussian Distributions
    Amudala, Srikanth
    Ali, Samr
    Bouguila, Nizar
    2020 IEEE 21ST INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION FOR DATA SCIENCE (IRI 2020), 2020, : 112 - 119
  • [8] LIMIT THEOREMS ASSOCIATED WITH THE PITMAN-YOR PROCESS
    Feng, Shui
    Gao, Fuqing
    Zhou, Youzhou
    ADVANCES IN APPLIED PROBABILITY, 2017, 49 (02) : 581 - 602
  • [9] A Hierarchical Pitman-Yor mixture of Scaled Dirichlet Distributions
    Baghdadi, Ali
    Manouchehri, Narges
    Bouguila, Nizar
    2022 IEEE 31ST INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS (ISIE), 2022, : 168 - 173
  • [10] Pitman-Yor process mixture model for community structure exploration considering latent interaction patterns*
    Wang, Jing
    Li, Kan
    CHINESE PHYSICS B, 2021, 30 (12)