Micro-Blog Topic Detection Method Based on BTM Topic Model and K-Means Clustering Algorithm

被引:32
|
作者
Li, Weijiang [1 ]
Feng, Yanming [1 ]
Li, Dongjun [2 ]
Yu, Zhengtao [1 ]
机构
[1] Kunming Univ Sci & Technol, Dept Informat Engn & Automat, Kunming 650500, Peoples R China
[2] Soochow Univ, Jinan Qingqi Peugeot Motorcycle Co Ltd, R&D Dept, Jinan 250104, Shandong, Peoples R China
基金
中国国家自然科学基金;
关键词
short text; topic model; topic discovery; K-means clustering;
D O I
10.3103/S0146411616040040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The development of micro-blog, generating large-scale short texts, provides people with convenient communication. In the meantime, discovering topics from short texts genuinely becomes an intractable problem. It was hard for traditional topic model-to-model short texts, such as probabilistic latent semantic analysis (PLSA) and Latent Dirichlet Allocation (LDA). They suffered from the severe data sparsity when disposed short texts. Moreover, K-means clustering algorithm can make topics discriminative when datasets is intensive and the difference among topic documents is distinct. In this paper, BTM topic model is employed to process short texts - micro-blog data for alleviating the problem of sparsity. At the same time, we integrating K-means clustering algorithm into BTM (Biterm Topic Model) for topics discovery further. The results of experiments on Sina micro-blog short text collections demonstrate that our method can discover topics effectively.
引用
收藏
页码:271 / 277
页数:7
相关论文
共 50 条
  • [1] Modeling on Micro-blog Topic Detection Based on Semantic Dependency
    Ruan, Dong-ru
    Han, Jia
    Dang, Ying
    Zhang, Shan-shan
    Gao, Kai
    2017 9TH INTERNATIONAL CONFERENCE ON MODELLING, IDENTIFICATION AND CONTROL (ICMIC 2017), 2017, : 839 - 844
  • [2] Topic Model-Based Micro-Blog User Interest Analysis
    Hu, Xinchen
    Zheng, Dequan
    Sun, Wanglong
    Li, Sheng
    2014 INTERNATIONAL CONFERENCE ON AUDIO, LANGUAGE AND IMAGE PROCESSING (ICALIP), VOLS 1-2, 2014, : 443 - 448
  • [3] A Distributed Approach For Chinese Micro-blog Hot Topic Detection
    Zhang Xiang
    Lin Ruitao
    Dong Lili
    Wang Ru
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON LOGISTICS, ENGINEERING, MANAGEMENT AND COMPUTER SCIENCE, 2014, 101 : 81 - 85
  • [4] Micro-blog Short Text Clustering Algorithm Based on Bootstrapping
    Jin, Chunxia
    Zhang, Su
    2019 12TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN (ISCID 2019), 2019, : 264 - 266
  • [5] Combination of Singular Value Decomposition and K-means Clustering Methods for Topic Detection on Twitter
    Nur'aini, Khumaisa
    Najahaty, Ibtisami
    Hidayati, Lina
    Murfi, Hendri
    Nurrohmah, Siti
    2015 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE AND INFORMATION SYSTEMS (ICACSIS), 2015, : 123 - 128
  • [6] Topic discovery method based on topic model combined with hierarchical clustering
    Wang, An
    Zhang, Junjie
    PROCEEDINGS OF 2020 IEEE 5TH INFORMATION TECHNOLOGY AND MECHATRONICS ENGINEERING CONFERENCE (ITOEC 2020), 2020, : 814 - 818
  • [7] An ordered clustering algorithm based on K-means and the PROMETHEE method
    Chen, Liuhao
    Xu, Zeshui
    Wang, Hai
    Liu, Shousheng
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2018, 9 (06) : 917 - 926
  • [8] An ordered clustering algorithm based on K-means and the PROMETHEE method
    Liuhao Chen
    Zeshui Xu
    Hai Wang
    Shousheng Liu
    International Journal of Machine Learning and Cybernetics, 2018, 9 : 917 - 926
  • [9] An Improved K-Means Clustering Algorithm Based on Semantic Model
    Liu, Zhe
    Bao, Jianmin
    Ding, Fei
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING 2018 (ICITEE '18), 2018,
  • [10] A Novel k-means Algorithm for Clustering and Outlier Detection
    Zhou, Yinghua
    Yu, Hong
    Cai, Xuemei
    2009 SECOND INTERNATIONAL CONFERENCE ON FUTURE INFORMATION TECHNOLOGY AND MANAGEMENT ENGINEERING, FITME 2009, 2009, : 476 - +