Design and analysis of a large-scale COVID-19 tweets dataset

被引:109
作者
Lamsal, Rabindra [1 ]
机构
[1] Jawaharlal Nehru Univ, Sch Comp & Syst Sci, New Delhi 110067, India
关键词
Social computing; Crisis computing; Sentiment analysis; Network analysis; Twitter data; TWITTER; SENTIMENT; TIME;
D O I
10.1007/s10489-020-02029-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
As of July 17, 2020, more than thirteen million people have been diagnosed with the Novel Coronavirus (COVID-19), and half a million people have already lost their lives due to this infectious disease. The World Health Organization declared the COVID-19 outbreak as a pandemic on March 11, 2020. Since then, social media platforms have experienced an exponential rise in the content related to the pandemic. In the past, Twitter data have been observed to be indispensable in the extraction of situational awareness information relating to any crisis. This paper presents COV19Tweets Dataset (Lamsal 2020a), a large-scale Twitter dataset with more than 310 million COVID-19 specific English language tweets and their sentiment scores. The dataset's geo version, the GeoCOV19Tweets Dataset (Lamsal 2020b), is also presented. The paper discusses the datasets' design in detail, and the tweets in both the datasets are analyzed. The datasets are released publicly, anticipating that they would contribute to a better understanding of spatial and temporal dimensions of the public discourse related to the ongoing pandemic. As per the stats, the datasets (Lamsal 2020a, 2020b) have been accessed over 74.5k times, collectively.
引用
收藏
页码:2790 / 2804
页数:15
相关论文
共 44 条
[1]   COVID-19 and the 5G Conspiracy Theory: Social Network Analysis of Twitter Data [J].
Ahmed, Wasim ;
Vidal-Alaball, Josep ;
Downing, Joseph ;
Lopez Segui, Francesc .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2020, 22 (05)
[2]  
Alqurashi S., 2020, ARXIV200404315
[3]  
[Anonymous], 2020, J MED INTERNET RES, DOI DOI 10.2196/19016
[4]  
[Anonymous], 2020, Estonia COVID-19 Statistics. Last Modified 2024
[5]   A Large-Scale COVID-19 Twitter Chatter Dataset for Open Scientific Research-An International Collaboration [J].
Banda, Juan M. ;
Tekumalla, Ramya ;
Wang, Guanyu ;
Yu, Jingyuan ;
Liu, Tuo ;
Ding, Yuning ;
Artemova, Ekaterina ;
Tutubalina, Elena ;
Chowell, Gerardo .
EPIDEMIOLOGIA, 2021, 2 (03) :315-324
[6]   Assessing Twitter Geocoding Resolution [J].
Bennett, Nicholas C. ;
Millard, David E. ;
Martin, David .
WEBSCI'18: PROCEEDINGS OF THE 10TH ACM CONFERENCE ON WEB SCIENCE, 2018, :239-243
[7]   Fast unfolding of communities in large networks [J].
Blondel, Vincent D. ;
Guillaume, Jean-Loup ;
Lambiotte, Renaud ;
Lefebvre, Etienne .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
[8]   A survey on fake news and rumour detection techniques [J].
Bondielli, Alessandro ;
Marcelloni, Francesco .
INFORMATION SCIENCES, 2019, 497 :38-55
[9]   Influence of fake news in Twitter during the 2016 US presidential election [J].
Bovet, Alexandre ;
Makse, Hernan A. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[10]   "Right Time, Right Place" Health Communication on Twitter: Value and Accuracy of Location Information [J].
Burton, Scott H. ;
Tanner, Kesler W. ;
Giraud-Carrier, Christophe G. ;
West, Joshua H. ;
Barnes, Michael D. .
JOURNAL OF MEDICAL INTERNET RESEARCH, 2012, 14 (06) :366-376