WASM: A Dataset for Hashtag Recommendation for Arabic Tweets

被引:1
|
作者
Al-Shaibani, Maged S. [1 ]
Luqman, Hamzah [1 ,2 ]
Al-Ghofaily, Abdulaziz S. [1 ]
Al-Najim, Abdullatif A. [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Informat & Comp Sci Dept, Dhahran, Saudi Arabia
[2] SDAIA KFUPM Joint Res Ctr Artificial Intelligence, Dhahran 31261, Saudi Arabia
关键词
Hashtag Recommendation; Hashtag Generation; Tweets Classification; Arabic Tweets; Twitter; Hashtags;
D O I
10.1007/s13369-023-08567-1
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
As one of the largest microblogging websites in the world, Twitter generates a huge amount of information daily. The massive size of the generated data increases the difficulty for humans to follow and receive information relevant to their interests. Therefore, Twitter allows users to annotate and categorize their tweets using appropriate hashtags. However, finding an appropriate hashtag for a tweet is not always straightforward. Furthermore, many users violate the hashtag flow by posting irrelevant content to the hashtag topic. These problems increase the need for a hashtag recommendation and classification system. This topic has received considerable attention from researchers in some languages, such as English and Chinese. However, this problem has not yet been explored for the Arabic language owing to the lack of datasets. In this study, we bridge this gap by proposing WASM, an Arabic Twitter hashtag recommendation dataset consisting of more than 100,000 tweets annotated with 87 hashtags. The proposed dataset is subjected to several rounds of automatic and manual filtrations to ensure that it is suitable for tasks related to tweets and hashtags. Further, we propose three systems for hashtag recommendation and classification. Each of these systems approaches the task differently by considering it as classification, generation, and named entity recognition problems. The results obtained using these systems are promising and can be used to benchmark the WASM dataset. The data and code are available at https://github.com/Hamzah-Luqman/wasm.
引用
收藏
页码:12131 / 12145
页数:15
相关论文
共 50 条
  • [41] Hashtag Recommendation Based on Multi-Features of Microblogs
    Fei-Fei Kou
    Jun-Ping Du
    Cong-Xian Yang
    Yan-Song Shi
    Wan-Qiu Cui
    Mei-Yu Liang
    Yue Geng
    Journal of Computer Science and Technology, 2018, 33 : 711 - 726
  • [42] Hashtag Recommendation Based on Multi-Features of Microblogs
    Kou, Fei-Fei
    Du, Jun-Ping
    Yang, Cong-Xian
    Shi, Yan-Song
    Cui, Wan-Qiu
    Liang, Mei-Yu
    Geng, Yue
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2018, 33 (04) : 711 - 726
  • [43] Social media signal detection using tweets volume, hashtag, and sentiment analysis
    Faria Nazir
    Mustansar Ali Ghazanfar
    Muazzam Maqsood
    Farhan Aadil
    Seungmin Rho
    Irfan Mehmood
    Multimedia Tools and Applications, 2019, 78 : 3553 - 3586
  • [44] Social mining for terroristic behavior detection through Arabic tweets characterization
    Alhalabi, Wadee
    Jussila, Jari
    Jambi, Kamal
    Visvizi, Anna
    Qureshi, Hafsa
    Lytras, Miltiadis
    Malibari, Areej
    Adham, Raniah Samir
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 116 : 132 - 144
  • [45] Spans Detection of Toxic Phrases in Arabic Tweets
    Radman, Azzam
    Atros, Mohammed
    Duwairi, Rehab
    2022 13TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION SYSTEMS (ICICS), 2022, : 315 - 320
  • [46] Detecting and Classifying Humanitarian Crisis in Arabic Tweets
    Adel, Ghadah
    Wang, Yuping
    2020 3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND BIG DATA (ICAIBD 2020), 2020, : 269 - 274
  • [47] Social media signal detection using tweets volume, hashtag, and sentiment analysis
    Nazir, Faria
    Ghazanfar, Mustansar Ali
    Maqsood, Muazzam
    Aadil, Farhan
    Rho, Seungmin
    Mehmood, Irfan
    MULTIMEDIA TOOLS AND APPLICATIONS, 2019, 78 (03) : 3553 - 3586
  • [48] Annotated dataset of history-related tweets
    Sumikawa, Yasunobu
    Jatowt, Adam
    DATA IN BRIEF, 2021, 38
  • [49] New hashtags' weighting schemes for Hashtag and User Recommendation on Twitter
    Gorrab, Abir
    Kboubi, Ferihane
    Ben Ghezala, Henda
    Le Grand, Benediete
    2017 IEEE/ACS 14TH INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS (AICCSA), 2017, : 564 - 570
  • [50] A Twitter Hashtag Recommendation Model that Accommodates for Temporal Clustering Effects
    Lu, Hsin-Min
    Lee, Chien-Hua
    IEEE INTELLIGENT SYSTEMS, 2015, 30 (03) : 18 - 25