IRLCov19: A Large COVID-19 Multilingual Twitter Dataset of Indian Regional Languages

被引:0
|
作者
Uniyal, Deepak [1 ]
Agarwal, Amit [2 ]
机构
[1] Graph Era Univ, Dehra Dun, Uttarakhand, India
[2] IIT Roorkee, Roorkee, Uttar Pradesh, India
来源
MACHINE LEARNING AND PRINCIPLES AND PRACTICE OF KNOWLEDGE DISCOVERY IN DATABASES, PT II | 2021年 / 1525卷
关键词
COVID-19; Twitter; Indian Regional Languages; Natural Language Processing;
D O I
10.1007/978-3-030-93733-1_22
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Emerged in Wuhan city of China in December 2019, COVID-19 continues to spread rapidly across the world despite authorities having made available a number of vaccines. While the coronavirus has been around for a significant period of time, people and authorities still feel the need for awareness due to the mutating nature of the virus and therefore varying symptoms and prevention strategies. People and authorities resort to social media platforms the most to share awareness information and voice out their opinions due to their massive outreach in spreading the word in practically no time. People use a number of languages to communicate over social media platforms based on their familiarity, language outreach, and availability on social media platforms. The entire world has been hit by the coronavirus and India is the second worst-hit country in terms of the number of active coronavirus cases. India, being a multilingual country, offers a great opportunity to study the outreach of various languages that have been actively used across social media platforms. In this study, we aim to study the dataset related to COVID-19 collected in the period between February 2020 to July 2020 specifically for regional languages in India. This could be helpful for the Government of India, various state governments, NGOs, researchers, and policymakers in studying different issues related to the pandemic. We found that English has been the mode of communication in over 64% of tweets while as many as twelve regional languages in India account for approximately 4.77% of tweets.
引用
收藏
页码:309 / 324
页数:16
相关论文
共 50 条
  • [21] COVID-19 Conspiracy Theories Discussion on Twitter
    Erokhin, Dmitry
    Yosipof, Abraham
    Komendantova, Nadejda
    SOCIAL MEDIA + SOCIETY, 2022, 8 (04):
  • [22] The COVID-19 Infodemic: Twitter versus Facebook
    Yang, Kai-Cheng
    Pierri, Francesco
    Hui, Pik-Mai
    Axelrod, David
    Torres-Lugo, Christopher
    Bryden, John
    Menczer, Filippo
    BIG DATA & SOCIETY, 2021, 8 (01):
  • [23] A Look into COVID-19 Vaccination Debate on Twitter
    Malagoli, Larissa
    Stancioli, Julia
    Ferreira, Carlos H. G.
    Vasconcelos, Marisa
    Couto da Silva, Ana Paula
    Almeida, Jussara
    PROCEEDINGS OF THE 13TH ACM WEB SCIENCE CONFERENCE, WEBSCI 2021, 2020, : 225 - 233
  • [24] Global Misinformation Spillovers in the Vaccination Debate Before and During the COVID-19 Pandemic: Multilingual Twitter Study
    Lenti, Jacopo
    Mejova, Yelena
    Kalimeri, Kyriaki
    Panisson, Andre
    Paolotti, Daniela
    Tizzani, Michele
    Starnini, Michele
    JMIR INFODEMIOLOGY, 2023, 3 (01):
  • [25] The Influence of Provaping "Gatewatchers" on the Dissemination of COVID-19 Misinformation on Twitter: Analysis of Twitter Discourse Regarding Nicotine and the COVID-19 Pandemic
    Silver, Nathan
    Kierstead, Elexis
    Kostygina, Ganna
    Tran, Hy
    Briggs, Jodie
    Emery, Sherry
    Schillo, Barbara
    JOURNAL OF MEDICAL INTERNET RESEARCH, 2022, 24 (09)
  • [26] COVID-19 Vaccine Brand Sentiment on Twitter
    Campan, AlMa
    Truta, Traian Marius
    Huesman, Shawn
    Meda, Vamsi
    Anderson, Jake
    PROCEEDINGS OF THE 2022 WORKSHOP ON OPEN CHALLENGES IN ONLINE SOCIAL NETWORKS, OASIS 2022/33RD ACM CONFERENCE ON HYPERTEXT AND SOCIAL MEDIA, HT 2022, 2022, : 39 - 49
  • [27] A COVID-19 Rumor Dataset
    Cheng, Mingxi
    Wang, Songli
    Yan, Xiaofeng
    Yang, Tianqi
    Wang, Wenshuo
    Huang, Zehao
    Xiao, Xiongye
    Nazarian, Shahin
    Bogdan, Paul
    FRONTIERS IN PSYCHOLOGY, 2021, 12
  • [28] A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration
    Banda, Juan M.
    Tekumalla, Ramya
    Wang, Guanyu
    Yu, Jingyuan
    Liu, Tuo
    Ding, Yuning
    Artemova, Ekaterina
    Tutubalina, Elena
    Chowell, Gerardo
    EPIDEMIOLOGIA, 2021, 2 (03): : 315 - 324
  • [29] Effects of COVID-19 on Multilingual Communication
    Pilgun, Maria
    Raskhodchikov, Aleksei N.
    Koreneva Antonova, Olga
    FRONTIERS IN PSYCHOLOGY, 2022, 12
  • [30] A large multiclass dataset of CT scans for COVID-19 identification
    Soares, Eduardo
    Angelov, Plamen
    Biaso, Sarah
    Cury, Marcelo
    Abe, Daniel
    EVOLVING SYSTEMS, 2024, 15 (02) : 635 - 640