Topic Modeling for Tracking COVID-19 Communication on Twitter

被引:0
作者
Bogovic, Petar Kristijan [1 ,2 ]
Mestrovic, Ana [1 ,2 ]
Martincic-Ipsic, Sanda [1 ,2 ]
机构
[1] Univ Rijeka, Fac Informat & Digital Technol, Radmile Matejcic 2, Rijeka 51000, Croatia
[2] Univ Rijeka, Ctr Artificial Intelligence & Cybersecur, Radmile Matejcic 2, Rijeka 51000, Croatia
来源
INFORMATION AND SOFTWARE TECHNOLOGIES, ICIST 2022 | 2022年 / 1665卷
关键词
Topic modeling; Latent Dirichlet Allocation; Coherence score; Croatian tweets; COVID-19; infodemic;
D O I
10.1007/978-3-031-16302-9_19
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this study, we analyze the trends of COVID-19 related communication in Croatian language on Twitter. First, we prepare a dataset of 147,028 tweets about COVID-19 posted during the first three waves of the pandemic, and then perform an analysis in three steps. In the first step, we train the LDA model and calculate the coherence values of the topics. We identify seven topics and report the ten most frequent words for each topic. In the second step, we analyze the proportion of tweets in each topic and report how these trends change over time. In the third step, we study spreading properties for each topic. The results show that all seven topics are evenly distributed across the three pandemic waves. The topic "vaccination" stands out with the change in percentage from 14.6% tweets in the first wave to 25.7% in the third wave. The obtained results contribute to a better understanding of pandemic communication in social media in Croatia.
引用
收藏
页码:248 / 258
页数:11
相关论文
共 29 条
[1]  
Babic Karlo, 2022, Proceedings of Sixth International Congress on Information and Communication Technology: ICICT 2021. Lecture Notes in Networks and Systems (216), P379, DOI 10.1007/978-981-16-1781-2_35
[2]   Characterisation of COVID-19-Related Tweets in the Croatian Language: Framework Based on the Cro-CoV-cseBERT Model [J].
Babic, Karlo ;
Petrovic, Milan ;
Beliga, Slobodan ;
Martincic-Ipsic, Sanda ;
Matesic, Mihaela ;
Mestrovic, Ana .
APPLIED SCIENCES-BASEL, 2021, 11 (21)
[3]   Infoveillance of the Croatian Online Media During the COVID-19 Pandemic: One-Year Longitudinal Study Using Natural Language Processing [J].
Beliga, Slobodan ;
Martincic-Ipsic, Sanda ;
Matesic, Mihaela ;
Vuksanovic, Irena Petrijevcanin ;
Mestrovic, Ana .
JMIR PUBLIC HEALTH AND SURVEILLANCE, 2021, 7 (12)
[4]   Selectivity-Based Keyword Extraction Method [J].
Beliga, Slobodan ;
Mestrovic, Ana ;
Martincic-Ipsic, Sanda .
INTERNATIONAL JOURNAL ON SEMANTIC WEB AND INFORMATION SYSTEMS, 2016, 12 (03) :1-26
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Bogovic Petar Kristijan, 2021, 2021 44th International Convention on Information, Communication and Electronic Technology (MIPRO), P1044, DOI 10.23919/MIPRO52101.2021.9597125
[7]   Who do you trust? The digital destruction of shared situational awareness and the COVID-19 infodemic [J].
Bunker, Deborah .
INTERNATIONAL JOURNAL OF INFORMATION MANAGEMENT, 2020, 55
[8]   The COVID-19 social media infodemic [J].
Cinelli, Matteo ;
Quattrociocchi, Walter ;
Galeazzi, Alessandro ;
Valensise, Carlo Michele ;
Brugnoli, Emanuele ;
Schmidt, Ana Lucia ;
Zola, Paola ;
Zollo, Fabiana ;
Scala, Antonio .
SCIENTIFIC REPORTS, 2020, 10 (01)
[9]   Social media can have an impact on how we manage and investigate the COVID-19 pandemic [J].
Cuello-Garcia, Carlos ;
Perez-Gaxiola, Giordano ;
van Amelsvoort, Ludo .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2020, 127 :198-201
[10]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO