Micro-Blog Topic Detection Method Based on BTM Topic Model and K-Means Clustering Algorithm

被引:32
|
作者
Li, Weijiang [1 ]
Feng, Yanming [1 ]
Li, Dongjun [2 ]
Yu, Zhengtao [1 ]
机构
[1] Kunming Univ Sci & Technol, Dept Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Soochow Univ, Jinan Qingqi Peugeot Motorcycle Co Ltd, R&D Dept, Jinan 250104, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
short text; topic model; topic discovery; K-means clustering;
D O I
10.3103/S0146411616040040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts - micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.
引用
收藏
页码:271 / 277
页数:7
相关论文
共 50 条
  • [31] A Novel Adaptive Motion Detection based on K-Means Clustering
    Tao, Fan
    Lin-Sheng, Li
    Qi-Chuan, Tian
    ICCSIT 2010 - 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, VOL 3, 2010, : 136 - 140
  • [32] Intrusion Detection Based on Simulated Annealing and K-means Clustering
    Wu Jian
    PROCEEDINGS OF 2010 INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND INDUSTRIAL ENGINEERING, VOLS I AND II, 2010, : 1001 - 1005
  • [33] K-SVM: An Effective SVM Algorithm Based on K-means Clustering
    Yao, Yukai
    Liu, Yang
    Yu, Yongqing
    Xu, Hong
    Lv, Weiming
    Li, Zhao
    Chen, Xiaoyun
    JOURNAL OF COMPUTERS, 2013, 8 (10) : 2632 - 2639
  • [34] Identification of electricity theft based on the k-means clustering method
    Lin, Qian
    Li, Mingming
    Feng, Shuhui
    Yang, Jingjing
    Sun, Xiaopeng
    Li, Jiangtao
    Wang, Zhiyuan
    Zhang, Jinghui
    Xie, Xiangmin
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON POWER ELECTRONICS SYSTEMS AND APPLICATIONS, PESA, 2022,
  • [35] Oversampling Method Based on Gaussian Distribution and K-Means Clustering
    Hassan, Masoud Muhammed
    Eesa, Adel Sabry
    Mohammed, Ahmed Jameel
    Arabo, Wahab Kh
    CMC-COMPUTERS MATERIALS & CONTINUA, 2021, 69 (01): : 451 - 469
  • [36] A novel high-quality community detection algorithm based on modified K-means clustering
    Li, Jingyong
    Huang, Lan
    Bai, Tian
    Wang, Zhe
    International Journal of Advancements in Computing Technology, 2012, 4 (11) : 248 - 256
  • [37] Point Cloud Simplification Method Based on k-Means Clustering
    He Yibo
    Chen Ranli
    Wu Kan
    Duan Zhixin
    LASER & OPTOELECTRONICS PROGRESS, 2019, 56 (09)
  • [38] Method of Redundant Features Eliminating based on K-means Clustering
    Li, Limin
    Wang, Zhongsheng
    MATERIALS SCIENCE, CIVIL ENGINEERING AND ARCHITECTURE SCIENCE, MECHANICAL ENGINEERING AND MANUFACTURING TECHNOLOGY, PTS 1 AND 2, 2014, 488-489 : 1023 - 1026
  • [39] An Improved Fractal Coding Method based on K-means Clustering
    Guo, Hui
    He, Jie
    Proceedings of the 2016 4th International Conference on Mechanical Materials and Manufacturing Engineering (MMME 2016), 2016, 79 : 294 - 300
  • [40] The validation method of simulation model based on K-means clustering and Fisher discriminant analysis
    Jiao Song
    Li Wei
    Yang Ming
    2013 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2013), 2013, : 313 - 316