Identifying Covid-19 misinformation tweets and learning their spatio-temporal topic dynamics using Nonnegative Coupled Matrix Tensor Factorization

被引:0
作者
Thirunavukarasu Balasubramaniam
Richi Nayak
Khanh Luong
Md. Abul Bashar
机构
[1] Queensland University of Technology,School of Computer Science and Centre for Data Science
来源
Social Network Analysis and Mining | 2021年 / 11卷
关键词
Covid-19; Misinformation detection; Topic modelling; Ranking; Spatio-temporal patterns; Nonnegative tensor factorization; Saturating Coordinate Descent;
D O I
暂无
中图分类号
学科分类号
摘要
Social media platforms like Twitter have become an easy portal for billions of people to connect and exchange their thoughts. Unfortunately, people commonly use these platforms to share misinformation which can influence other people adversely. The spread of misinformation is unavoidable in an extraordinary situation like Covid-19, and the consequences can be dreadful. This paper proposes a two-step ranking-based misinformation detection (RMiD) technique. Firstly, a novel ranking-based approach leveraging the scalable information retrieval infrastructure is applied to detect misinformation from a huge collection of unlabelled tweets based on a related but very small labelled misinformation data set. Secondly, the identified misinformation tweets are represented as a coupled matrix tensor model and Nonnegative Coupled Matrix Tensor Factorization is applied to learn their spatio-temporal topic dynamics. The experimental analysis shows that RMiD is capable of detecting misinformation with better coverage and less noise in comparison with existing techniques. Moreover, the coupled matrix tensor representation has improved the quality of topics discovered from unlabelled data up to 4% by leveraging the semantic similarity of terms in labelled data.
引用
收藏
相关论文
共 78 条
[1]  
Allcott H(2017)Social media and fake news in the 2016 election J Econ Perspect 31 211-36
[2]  
Gentzkow M(2007)Algorithms and applications for approximate nonnegative matrix factorization Comput Stat Data Anal 52 155-173
[3]  
Berry MW(2003)Latent dirichlet allocation J Mach Learn Res 3 993-1022
[4]  
Browne M(1999)Parafac2–part II. Modeling chromatographic data with retention time shifts J Chemom J Chemom Soc 13 295-309
[5]  
Langville AN(1974)A dendrite method for cluster analysis Commun Stat Theory Methods 3 1-27
[6]  
Pauca VP(1970)Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition Psychometrika 35 283-319
[7]  
Plemmons RJ(2013)Predicting information credibility in time-sensitive social media Internet Res 23 560-588
[8]  
Blei DM(2016)Real-time or near real-time persisting daily healthcare data into HDFs and elasticsearch index inside a big data platform IEEE Trans Ind Inf 13 595-606
[9]  
Ng AY(2020)Tracking social media discourse about the covid-19 pandemic: development of a public coronavirus twitter data set JMIR Public Health Surveill 6 e19273-227
[10]  
Jordan MI(1979)A cluster separation measure IEEE Trans Pattern Anal Mach Intell 2 224-115