Multi-label Emotion Classification of COVID-19 Tweets with Deep Learning and Topic Modelling

被引:0
作者
Anuratha K. [1 ]
Parvathy M. [2 ]
机构
[1] Department of Information Technology, Sri Sai Ram Institute of Technology, Tamilnadu, Chennai
[2] Department of Computer Science and Engineering, Sethu Institute of Technology, Tamilnadu, Madurai
来源
Computer Systems Science and Engineering | 2023年 / 45卷 / 03期
关键词
convolutional neural network (CNN); corona virus; COVID-19; emotion classification; healthcare; non-negative matrix factorization (NMF); sentiment classification; topic detection; Twitter;
D O I
10.32604/csse.2023.031553
中图分类号
学科分类号
摘要
The COVID-19 pandemic has become one of the severe diseases in recent years. As it majorly affects the common livelihood of people across the universe, it is essential for administrators and healthcare professionals to be aware of the views of the community so as to monitor the severity of the spread of the outbreak. The public opinions are been shared enormously in microblogging media like twitter and is considered as one of the popular sources to collect public opinions in any topic like politics, sports, entertainment etc., This work presents a combination of Intensity Based Emotion Classification Convolution Neural Network (IBEC-CNN) model and Non-negative Matrix Factorization (NMF) for detecting and analyzing the different topics discussed in the COVID-19 tweets as well the intensity of the emotional content of those tweets. The topics were identified using NMF and the emotions are classified using pretrained IBEC-CNN, based on predefined intensity scores. The research aimed at identifying the emotions in the Indian tweets related to COVID-19 and producing a list of topics discussed by the users during the COVID-19 pandemic. Using the Twitter Application Programming Interface (Twitter API), huge numbers of COVID-19 tweets are retrieved during January and July 2020. The extracted tweets are analyzed for emotions fear, joy, sadness and trust with proposed Intensity Based Emotion Classification Convolution Neural Network (IBEC-CNN) model which is pretrained. The classified tweets are given an intensity score varies from 1 to 3, with 1 being low intensity for the emotion, 2 being the moderate and 3 being the high intensity. To identify the topics in the tweets and the themes of those topics, Non-negative Matrix Factorization (NMF) has been employed. Analysis of emotions of COVID-19 tweets has identified, that the count of positive tweets is more than that of count of negative tweets during the period considered and the negative tweets related to COVID-19 is less than 5%. Also, more than 75% negative tweets expressed sadness, fear are of low intensity. A qualitative analysis has also been conducted and the topics detected are grouped into themes such as economic impacts, case reports, treatments, entertainment and vaccination. The results of analysis show that the issues related to the pandemic are expressed different emotions in twitter which helps in interpreting the public insights during the pandemic and these results are beneficial for planning the dissemination of factual health statistics to build the trust of the people. The performance comparison shows that the proposed IBEC-CNN model outperforms the conventional models and achieved 83.71% accuracy. The % of COVID-19 tweets that discussed the different topics vary from 7.45% to 26.43% on topics economy, Statistics on cases, Government/Politics, Entertainment, Lockdown, Treatments and Virtual Events. The least number of tweets discussed on politics/government on the other hand the tweets discussed most about treatments. © 2023 CRL Publishing. All rights reserved.
引用
收藏
页码:3005 / 3021
页数:16
相关论文
共 27 条
[1]  
Park H. W., Park S., Chong M., Conversations and medical news frames on twitter: Endemiological study on COVID-19, South Korea Journal Medical Internet Resource, 22, 5, pp. 88-197, (2020)
[2]  
Fortuna P., Nunes S., A survey on automatic detection of hate speech in text, ACM Computer Survey, 51, 4, pp. 1-30, (2018)
[3]  
Deerwester S., Dumais S. T., Furnas G. W., Landauer T. K., Harshman R., Indexing by latent semantic analysis, Journal American Social Information Science, 41, 6, pp. 391-407, (1990)
[4]  
Hofmann T., Probabilistic latent semantic indexing, Proc. 22nd Annual Int. ACM SIGIR Conf. Resource Development, 1, pp. 50-57, (1999)
[5]  
Blei D. M., Ng A. Y., Jordan M. I., Latent Dirichlet allocation, Journal Mechanical Learning Resource, 3, pp. 993-1022, (2003)
[6]  
Lee D. D., Seung H. S., Learning the parts of objects by non-negative matrix factorization, Journal of Nature, 401, 6755, pp. 788-791, (1999)
[7]  
Hoffman S. J., Justicz V., Automatically quantifying the scientific quality and sensationalism of news records mentioning pandemics: Validating a maximum entropy machine-learning model, Journal of Clinical Epidemiology, 75, 4, pp. 47-55, (2016)
[8]  
Abd-Alrazaq A., Alhuwail D., Househ M., Hamdi M., Shah Z., Top concerns of tweeters during the COVID-19 pandemic: Infoveillance study, Journal of Medical Internet Resources, 22, 4, pp. 116-190, (2020)
[9]  
Farooq A., Laato S., Islam A., Impact of online information on self-isolation intention during the COVID-19 pandemic: A cross-sectional study, Journal of Medical Internet Resources, 22, 5, pp. 190-196, (2020)
[10]  
Jo W., Lee J., Park J., Kim Y., Online information exchange and anxiety spread in the early stage of the novel coronavirus (COVID-19) outbreak in South Korea: Structural topic model and network analysis, Journal of Medical Internet Resources, 22, 6, pp. 116-190, (2020)