Topic representation model based on microblogging behavior analysis

被引:10
|
作者
Han, Weihong [1 ]
Tian, Zhihong [1 ]
Huang, Zizhong [2 ]
Li, Shudong [1 ]
Jia, Yan [3 ]
机构
[1] Guangzhou Univ, Cyberspace Inst Adv Technol, Guangzhou 510006, Peoples R China
[2] Natl Univ Def Technol, Comp Sch, Changsha 410073, Peoples R China
[3] Cyberspace Secur Res Ctr, Peng Cheng Lab, Shenzhen 518000, Peoples R China
来源
WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS | 2020年 / 23卷 / 06期
关键词
Topic representation model; Behavior analysis; Word distribution; LDA model; Topic detection; INTERNET;
D O I
10.1007/s11280-020-00822-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
With the development of microblogging, it has become an important way for people to obtain information, express opinions, and make suggestions. Identifying new topics quickly and accurately from the massive microblogging data plays a crucial role for recommending information and controlling public opinion. The topic representation model provides a basis for topic detection. In this paper, we propose a topic representation model based on user behavior analysis, i.e., microblogging behavior analysis-latent Dirichlet allocation (MBA-LDA) model, for microblogging datasets. Topic-word distribution is acquired by the LDA model which considers information on user behaviors (such as posting, forwarding and commenting) and word distribution among documents within one topic and among different topics. The model also re-assesses the importance of words in topic representation. The basic idea is that the distribution of words within a topic or among different topics has a great influence on the selection of topic expression words. If a word is evenly distributed among all documents of a certain topic, it indicates that the word is the common word of all documents in the topic, and it is more suitable to represent this topic. If a word is more evenly distributed among various topics, it indicates that the word is the common word of all topics, and it can't achieve the purpose of distinguishing topics, so it is less suitable to represent any topic. By experiments with Sina Microblogging's actual data set, the topic model based on the MBA-LDA algorithm makes the representative words more important and increases the differentiation of topic words, which effectively improves the accuracy of subsequent topic detection and evolutionary analysis.
引用
收藏
页码:3083 / 3097
页数:15
相关论文
共 50 条
  • [21] Topic Detection based on Deep Learning Language Model in Turkish Microblogs
    Sahinuc, Furkan
    Toraman, Cagri
    Koc, Aykut
    29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
  • [22] Study on Food Safety Emergency Topic Detection Model Based on Semantics
    Liang, Meiyu
    Du, Junping
    Hu, Juan
    Yang, Yuehua
    PROCEEDINGS OF 2011 INTERNATIONAL CONFERENCE ON ADVANCED INTELLIGENCE AND AWARENESS INTERNET, IET AIAI2011, 2011, : 114 - 118
  • [23] Evaluating the latest trends of Industry 4.0 based on LDA topic model
    Ozyurt, Ozcan
    Ozkose, Hakan
    Ayaz, Ahmet
    JOURNAL OF SUPERCOMPUTING, 2024, 80 (13) : 19003 - 19030
  • [24] WOF: Towards Behavior Analysis and Representation of Emotions in Adaptive Systems
    Alloui, Ilham
    Vernier, Flavien
    SOFTWARE TECHNOLOGIES ( ICSOFT 2017), 2018, 868 : 244 - 267
  • [25] A Comparative Study on Text Representation Models for Topic Detection in Arabic
    Koulali, Rim
    Meziane, Abdelouafi
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 683 - 691
  • [26] TweetSemMiner: A Meta-Topic Identification Model for Twitter Using Semantic Analysis
    Menendez, Hector D.
    Delgado-Calle, Carlos
    Camacho, David
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2014, 2014, 8669 : 69 - 76
  • [27] Sentiment Analysis and Topic Classification based on Binary Maximum Entropy Classifiers
    Batista, Fernando
    Ribeiro, Ricardo
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2013, (50): : 77 - 84
  • [28] A pattern-based topic detection and analysis system on Chinese tweets
    Zhang, Lu
    Wu, Zhiang
    Bu, Zhan
    Jiang, Ye
    Cao, Jie
    JOURNAL OF COMPUTATIONAL SCIENCE, 2018, 28 : 369 - 381
  • [29] Topic Detection by Topic Model Induced Distance Using Biased Initiation
    Wu, Yonghui
    Ding, Yuxin
    Wang, Xiaolong
    Xu, Jun
    ADVANCES IN COMPUTER SCIENCE AND INFORMATION TECHNOLOGY, PROCEEDINGS, 2010, 6059 : 310 - +
  • [30] Movie Recommendation System Using Sentiment Analysis From Microblogging Data
    Kumar, Sudhanshu
    De, Kanjar
    Roy, Partha Pratim
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2020, 7 (04): : 915 - 923