HashCat: A Novel Approach for the Topic Classification of Multilingual Twitter Trends

被引:2
|
作者
Kausar, Soufia [1 ]
Tahir, Bilal [1 ]
Mehmood, Muhammad Amir [1 ]
机构
[1] Univ Engn & Technol, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan
来源
2021 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2021) | 2021年
关键词
Twitter hashtags; topic classification; hashtag segmentation;
D O I
10.1109/FIT53504.2021.00047
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the growing usage of online social networks, an enormous amount of data is generated by users daily. Twitter microblog groups tweets of the same hashtag which is beneficial for the users to extract the required information for the target hashtag effortlessly. However, understanding these hashtags is a challenging task as tweets contain short, multi-lingual content and non-standard vocabulary. In this article, we propose HashCat - a novel approach for the topic classification of multilingual Twitter trends. In addition, we present a technique for the segmentation of English and Urdu language hashtags. First, we develop a labelled dataset of HT-Dat containing 1,882 hashtags of Urdu and English languages by manually labelling them into six wide range categories. Next, we utilize the features of i) tweet text, ii) co-occurrence and iii) segment similarity for the classification of hashtags. The HashCat achieves an overall accuracy of 0.93 on the HT-Dat dataset. The classification results and Type-Token Ratio analysis for various hashtag categories reveal that the categories with low lexical diversity are classified with higher accuracy by the HashCat classifier. We believe that our methodology can be helpful for social media analysts to conduct research on specific domain hashtags.
引用
收藏
页码:212 / 217
页数:6
相关论文
共 50 条
  • [31] Modeling Topic Evolution in Twitter: An Embedding-Based Approach
    Abulaish, Muhammad
    Fazil, Mohd
    IEEE ACCESS, 2018, 6 : 64847 - 64857
  • [32] Novel Topic Models for Parallel Topics Extraction from Multilingual Text
    Maanicshah, Kamal
    Manouchehri, Narges
    Amayri, Manar
    Bouguila, Nizar
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2023, PT II, 2023, 13996 : 297 - 309
  • [33] A Multi-Task Neural Network for Multilingual Sentiment Classification and Language Detection on Twitter
    Wehrmann, Jonatas
    Becker, Willian E.
    Barros, Rodrigo C.
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 1805 - 1812
  • [34] Towards a Statistical Approach for User Classification in Twitter
    Daouadi, Kheir Eddine
    Rebai, Rim Zghal
    Amous, Ikram
    MACHINE LEARNING FOR NETWORKING, 2019, 11407 : 33 - 43
  • [35] Hierarchical Classification Approach to Emotion Recognition in Twitter
    Esmin, Ahmed A. A.
    de Oliveira, Roberto L., Jr.
    Matwin, Stan
    2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, : 381 - 385
  • [36] A step forward for Topic Detection in Twitter: An FCA-based approach
    Cigarran, Juan
    Castellanos, Angel
    Garcia-Serrano, Ana
    EXPERT SYSTEMS WITH APPLICATIONS, 2016, 57 : 21 - 36
  • [37] Unsupervised Topic Extraction from Twitter: A Feature-pivot Approach
    GabAllah, Nada A.
    Rafea, Ahmed
    WEBIST: PROCEEDINGS OF THE 15TH INTERNATIONAL CONFERENCE ON WEB INFORMATION SYSTEMS AND TECHNOLOGIES, 2019, : 185 - 192
  • [38] A novel topic feature for image scene classification
    Zang, Mujun
    Wen, Dunwei
    Wang, Ke
    Liu, Tong
    Song, Weiwei
    NEUROCOMPUTING, 2015, 148 : 467 - 476
  • [39] Automated classification of patents: A topic modeling approach
    Yun, Junghwan
    Geum, Youngjung
    COMPUTERS & INDUSTRIAL ENGINEERING, 2020, 147
  • [40] A multilingual approach to the classification of questions based on automatic learning
    Tomas, David
    Vicedo, Jose. L.
    Suarez, Armando
    Bisbal, Empar
    Moreno, Lidia
    PROCESAMIENTO DEL LENGUAJE NATURAL, 2005, (35): : 391 - 398