Embedding Semantic Anchors to Guide Topic Models on Short Text Corpora

被引:3
作者
Steuber, Florian [1 ]
Schneider, Sinclair [1 ]
Schoenfeld, Mirco [2 ]
机构
[1] Univ Bundeswehr Munchen, Res Inst CODE, Neubiberg, Germany
[2] Univ Bayreuth, Bayreuth, Germany
关键词
Topic modeling; Short text; Word embedding; Transfer learning; Big data;
D O I
10.1016/j.bdr.2021.100293
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Documents on the social media platform Twitter are formulated in short and simple style, instead of being written extensively and elaborately. Further, the core message of a post is often encoded into characteristic phrases called hashtags. These hashtags illustrate the semantics of a post or tie it to a specific topic. In this paper, we propose multiple approaches of using hashtags and their surrounding texts to improve topic modeling of short texts. We use transfer learning by applying a pre-trained word embedding of hashtags to derive preliminary topics. These function as supervising information, or seed topics and are passed to Archetypal LDA (A-LDA), a recent variant of Latent Dirichlet Allocation. We demonstrate the effectiveness of our approach using a large corpus of posts exemplarily on Twitter. Our approaches improve the topic model's qualities in terms of various quantitative metrics. Moreover, the presented algorithms used to extract seed topics can be utilized as form of lightweight topic model by themselves. Hence, our approaches create additional analytical opportunities and can help to gain a more detailed understanding of what people are talking about on social media. By using big data in terms of millions of tweets for preprocessing and fine-tuning, we enable the classification algorithm to produce topics that are very coherent to the reader. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Spatial Temporal Topic Embedding: A Semantic Modeling Method for Short Text in Social Network
    Yang, Congxian
    Du, Junping
    Kou, Feifei
    Lee, Jangmyung
    ARTIFICIAL INTELLIGENCE (ICAI 2018), 2018, 888 : 198 - 210
  • [2] Probabilistic topic modeling for short text based on word embedding networks
    Pita, Marcelo
    Nunes, Matheus
    Pappa, Gisele L.
    APPLIED INTELLIGENCE, 2022, 52 (15) : 17829 - 17844
  • [3] Probabilistic topic modeling for short text based on word embedding networks
    Marcelo Pita
    Matheus Nunes
    Gisele L. Pappa
    Applied Intelligence, 2022, 52 : 17829 - 17844
  • [4] Incorporating Embedding to Topic Modeling for More Effective Short Text Analysis
    Rashid, Junaid
    Kim, Jungeun
    Naseem, Usman
    COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023, 2023, : 73 - 76
  • [5] Semantic Augmented Topic Model over Short Text
    Li, Lingyun
    Sun, Yawei
    Wang, Cong
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 652 - 656
  • [6] Short Text Clustering based on Word Semantic Graph with Word Embedding Model
    Jinarat, Supakpong
    Manaskasemsak, Bundit
    Rungsawang, Arnon
    2018 JOINT 10TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 19TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2018, : 1427 - 1432
  • [7] TextNetTopics Pro, a topic model-based text classification for short text by integration of semantic and document-topic distribution information
    Voskergian, Daniel
    Bakir-Gungor, Burcu
    Yousef, Malik
    FRONTIERS IN GENETICS, 2023, 14
  • [8] Combine Topic Modeling with Semantic Embedding: Embedding Enhanced Topic Model
    Zhang, Peng
    Wang, Suge
    Li, Deyu
    Li, Xiaoli
    Xu, Zhikang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2020, 32 (12) : 2322 - 2335
  • [9] TopExplorer: Tool Support for Extracting and Visualizing Topic Models in Bioengineering Text Corpora
    Cheng, Kwok Sun
    Wang, Zhipeng
    Huang, Pei-Chi
    Chundi, Parvathi
    Song, Myoungkyu
    2020 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2020, : 334 - 343
  • [10] Constructing Pseudo Documents with Semantic Similarity for Short Text Topic Discovery
    Lu, Heng-yang
    Li, Yun
    Tang, Chi
    Wang, Chong-jun
    Xie, Jun-yuan
    NEURAL INFORMATION PROCESSING (ICONIP 2018), PT V, 2018, 11305 : 437 - 449