Embedding Semantic Anchors to Guide Topic Models on Short Text Corpora

被引:3
作者
Steuber, Florian [1 ]
Schneider, Sinclair [1 ]
Schoenfeld, Mirco [2 ]
机构
[1] Univ Bundeswehr Munchen, Res Inst CODE, Neubiberg, Germany
[2] Univ Bayreuth, Bayreuth, Germany
关键词
Topic modeling; Short text; Word embedding; Transfer learning; Big data;
D O I
10.1016/j.bdr.2021.100293
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Documents on the social media platform Twitter are formulated in short and simple style, instead of being written extensively and elaborately. Further, the core message of a post is often encoded into characteristic phrases called hashtags. These hashtags illustrate the semantics of a post or tie it to a specific topic. In this paper, we propose multiple approaches of using hashtags and their surrounding texts to improve topic modeling of short texts. We use transfer learning by applying a pre-trained word embedding of hashtags to derive preliminary topics. These function as supervising information, or seed topics and are passed to Archetypal LDA (A-LDA), a recent variant of Latent Dirichlet Allocation. We demonstrate the effectiveness of our approach using a large corpus of posts exemplarily on Twitter. Our approaches improve the topic model's qualities in terms of various quantitative metrics. Moreover, the presented algorithms used to extract seed topics can be utilized as form of lightweight topic model by themselves. Hence, our approaches create additional analytical opportunities and can help to gain a more detailed understanding of what people are talking about on social media. By using big data in terms of millions of tweets for preprocessing and fine-tuning, we enable the classification algorithm to produce topics that are very coherent to the reader. (C) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] A systematic review of the use of topic models for short text social media analysis
    Laureate, Caitlin Doogan Poet
    Buntine, Wray
    Linger, Henry
    ARTIFICIAL INTELLIGENCE REVIEW, 2023, 56 (12) : 14223 - 14255
  • [12] Short text topic modelling using local and global word-context semantic correlation
    Kinariwala, Supriya
    Deshmukh, Sachin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (17) : 26411 - 26433
  • [13] Short text topic modelling using local and global word-context semantic correlation
    Supriya Kinariwala
    Sachin Deshmukh
    Multimedia Tools and Applications, 2023, 82 : 26411 - 26433
  • [14] Comprehensive Analysis of Topic Models and Long Text Data for Short
    Goyal, Astha
    Kashyap, Indu
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (12) : 249 - 259
  • [15] SeNSe: embedding alignment via semantic anchors selection
    Malandri, Lorenzo
    Mercorio, Fabio
    Mezzanzanica, Mario
    Pallucchini, Filippo
    INTERNATIONAL JOURNAL OF DATA SCIENCE AND ANALYTICS, 2024, 20 (1) : 167 - 181
  • [16] Fuzzy topic modeling approach for text mining over short text
    Rashid, Junaid
    Shah, Syed Muhammad Adnan
    Irtaza, Aun
    INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (06)
  • [17] TopicNets: Visual Analysis of Large Text Corpora with Topic Modeling
    Gretarsson, Brynjar
    O'Donovan, John
    Bostandjiev, Svetlin
    Hoellerer, Tobias
    Asuncion, Arthur
    Newman, David
    Smyth, Padhraic
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2012, 3 (02)
  • [18] Filtering out the noise in short text topic modeling
    Li, Ximing
    Wang, Yue
    Zhang, Ang
    Li, Changchun
    Chi, Jinjin
    Ouyang, Jihong
    INFORMATION SCIENCES, 2018, 456 : 83 - 96
  • [19] Short text topic modeling by exploring original documents
    Ximing Li
    Changchun Li
    Jinjin Chi
    Jihong Ouyang
    Knowledge and Information Systems, 2018, 56 : 443 - 462
  • [20] A general framework to expand short text for topic modeling
    Bicalho, Paulo
    Pita, Marcelo
    Pedrosa, Gabriel
    Lacerda, Anisio
    Pappa, Gisele L.
    INFORMATION SCIENCES, 2017, 393 : 66 - 81