Micro-Blog Topic Detection Method Based on BTM Topic Model and K-Means Clustering Algorithm

被引:32
|
作者
Li, Weijiang [1 ]
Feng, Yanming [1 ]
Li, Dongjun [2 ]
Yu, Zhengtao [1 ]
机构
[1] Kunming Univ Sci & Technol, Dept Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Soochow Univ, Jinan Qingqi Peugeot Motorcycle Co Ltd, R&D Dept, Jinan 250104, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
short text; topic model; topic discovery; K-means clustering;
D O I
10.3103/S0146411616040040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts - micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.
引用
收藏
页码:271 / 277
页数:7
相关论文
共 50 条
  • [21] Step-by-step classification detection algorithm of SPPM based on K-means clustering
    Wang H.
    Hou W.
    Peng Q.
    Cao M.
    Huang R.
    Liu L.
    Tongxin Xuebao/Journal on Communications, 2022, 43 (01): : 161 - 171
  • [22] SAR Image Change Detection Based on Mathematical Morphology and the K-Means Clustering Algorithm
    Liu, Luyang
    Jia, Zhenhong
    Yang, Jie
    Kasabov, Nikola K.
    IEEE ACCESS, 2019, 7 : 43970 - 43978
  • [23] An Efficient Hierarchy-Based of K-Means Clustering Algorithm
    Li Yong-peng
    Zhang Bo-tao
    Zhang Shuai-qin
    2008 INTERNATIONAL WORKSHOP ON INFORMATION TECHNOLOGY AND SECURITY, 2008, : 106 - 110
  • [24] Improved K-means clustering algorithm based on user tag
    Tang J.
    Journal of Convergence Information Technology, 2010, 5 (10) : 124 - 130
  • [25] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Ping, Ping
    Wang, Longbao
    2015 NINTH INTERNATIONAL CONFERENCE ON FRONTIER OF COMPUTER SCIENCE AND TECHNOLOGY FCST 2015, 2015, : 386 - 391
  • [26] An Optimal Distributed K-Means Clustering Algorithm Based on CloudStack
    Mao, Yingchi
    Xu, Ziyang
    Li, Xiaofang
    Ping, Ping
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 3149 - 3156
  • [27] Historical Seismic Intensity Determination Method Based on Ant K-means clustering Algorithm
    Zhu, Wenming
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, KNOWLEDGE ENGINEERING AND INFORMATION ENGINEERING (SEKEIE 2014), 2014, 114 : 114 - 117
  • [28] A Non-line of Sight Localization Method based on K-Means Clustering Algorithm
    Cheng, Long
    Wu, Xuehan
    Wang, Yan
    PROCEEDINGS OF 2017 IEEE 7TH INTERNATIONAL CONFERENCE ON ELECTRONICS INFORMATION AND EMERGENCY COMMUNICATION (ICEIEC), 2017, : 465 - 468
  • [29] Digital image clustering based on improved k-means algorithm
    Gao Xi
    Hu Zi-mu
    CHINESE JOURNAL OF LIQUID CRYSTALS AND DISPLAYS, 2020, 35 (02) : 173 - 179
  • [30] Automatic Text Summarization Method Based on Improved TextRank Algorithm and K-Means Clustering
    Liu, Wenjun
    Sun, Yuyan
    Yu, Bao
    Wang, Hailan
    Peng, Qingcheng
    Hou, Mengshu
    Guo, Huan
    Wang, Hai
    Liu, Cheng
    KNOWLEDGE-BASED SYSTEMS, 2024, 287