HashCat: A Novel Approach for the Topic Classification of Multilingual Twitter Trends

被引:2
|
作者
Kausar, Soufia [1 ]
Tahir, Bilal [1 ]
Mehmood, Muhammad Amir [1 ]
机构
[1] Univ Engn & Technol, Al Khawarizmi Inst Comp Sci, Lahore, Pakistan
来源
2021 INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2021) | 2021年
关键词
Twitter hashtags; topic classification; hashtag segmentation;
D O I
10.1109/FIT53504.2021.00047
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
With the growing usage of online social networks, an enormous amount of data is generated by users daily. Twitter microblog groups tweets of the same hashtag which is beneficial for the users to extract the required information for the target hashtag effortlessly. However, understanding these hashtags is a challenging task as tweets contain short, multi-lingual content and non-standard vocabulary. In this article, we propose HashCat - a novel approach for the topic classification of multilingual Twitter trends. In addition, we present a technique for the segmentation of English and Urdu language hashtags. First, we develop a labelled dataset of HT-Dat containing 1,882 hashtags of Urdu and English languages by manually labelling them into six wide range categories. Next, we utilize the features of i) tweet text, ii) co-occurrence and iii) segment similarity for the classification of hashtags. The HashCat achieves an overall accuracy of 0.93 on the HT-Dat dataset. The classification results and Type-Token Ratio analysis for various hashtag categories reveal that the categories with low lexical diversity are classified with higher accuracy by the HashCat classifier. We believe that our methodology can be helpful for social media analysts to conduct research on specific domain hashtags.
引用
收藏
页码:212 / 217
页数:6
相关论文
共 50 条
  • [1] A simple approach to multilingual polarity classification in Twitter
    Tellez, Eric S.
    Miranda-Jimenez, Sabino
    Graff, Mario
    Moctezuma, Daniela
    Suarez, Ranyart R.
    Siordia, Oscar S.
    PATTERN RECOGNITION LETTERS, 2017, 94 : 68 - 74
  • [2] Zika discourse in the Americas: A multilingual topic analysis of Twitter
    Pruss, Dasha
    Fujinuma, Yoshinari
    Daughton, Ashlynn R.
    Paul, Michael J.
    Arnot, Brad
    Szafir, Danielle Albers
    Boyd-Graber, Jordan
    PLOS ONE, 2019, 14 (05):
  • [3] A longitudinal study of topic classification on Twitter
    Bouadjenek, Mohamed Reda
    Sanner, Scott
    Iman, Zahra
    Xie, Lexing
    Shi, Daniel Xiaoliang
    PEERJ COMPUTER SCIENCE, 2022, 8
  • [4] TOPIC DETECTION AND COMPRESSED CLASSIFICATION IN TWITTER
    Milioris, Dimitris
    Jacquet, Philippe
    2015 23RD EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2015, : 1905 - 1909
  • [5] A new big data approach for topic classification and sentiment analysis of Twitter data
    Rodrigues, Anisha P.
    Chiplunkar, Niranjan N.
    EVOLUTIONARY INTELLIGENCE, 2022, 15 (02) : 877 - 887
  • [6] Multilingual Topic Classification in X: Dataset and Analysis
    Antypas, Dimosthenis
    Ushio, Asahi
    Barbieri, Francesco
    Camacho-Collados, Jose
    EMNLP 2024 - 2024 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 2024, : 20136 - 20152
  • [7] A new big data approach for topic classification and sentiment analysis of Twitter data
    Anisha P. Rodrigues
    Niranjan N. Chiplunkar
    Evolutionary Intelligence, 2022, 15 : 877 - 887
  • [8] Multilingual Text Classification from Twitter during Emergencies
    Piscitelli, Sara
    Arnaudo, Edoardo
    Rossi, Claudio
    2021 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS (ICCE), 2021,
  • [9] Multilingual Twitter Sentiment Classification: The Role of Human Annotators
    Mozetic, Igor
    Grcar, Miha
    Smailovic, Jasmina
    PLOS ONE, 2016, 11 (05):
  • [10] Twitter-TTM: An Efficient Online Topic Modeling for Twitter considering Dynamics of User Interests and Topic Trends
    Sasaki, Kentaro
    Yoshikawa, Tomohiro
    Furuhashi, Takeshi
    2014 JOINT 7TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING AND INTELLIGENT SYSTEMS (SCIS) AND 15TH INTERNATIONAL SYMPOSIUM ON ADVANCED INTELLIGENT SYSTEMS (ISIS), 2014, : 440 - 445