Fake news detection models using the largest social media ground-truth dataset (TruthSeeker)

被引:0
作者
Khalil M. [1 ]
Azzeh M. [1 ]
机构
[1] Data Science Department, Princess Sumaya University for Technology, Amman
关键词
Deep learning; Fake news detection; Machine learning; Text representation; Transformers;
D O I
10.1007/s10772-024-10106-8
中图分类号
学科分类号
摘要
Twitter is a powerful platform for communication and information sharing but is also susceptible to spreading false information. This false information has adverse consequences for society and can significantly impact public perception, decision-making, and political outcomes. Therefore, there is an urgent need to build a fake news detection system that can accurately catch false information before it is disseminated. Building such a system requires the existence of good quality and trustworthy labeled datasets. The limitations of the existing datasets are undeniable. Most of them are not updated to reflect the advanced generation patterns of the new fake news creators. Thanks to Truth Seeker research team, who offered a large-scale fake news dataset that was labeled based on Amazon Mechanical Turk. The dataset was collected between 2009 and 2022 and then validated according to a robust procedure to ensure its quality and reliability. However, the credibility and trustability of this dataset is still questionable. In this paper, we study and analyze the feasibility of building a fake news detection model based on deep learning using Truth seeker dataset. Mainly we investigated the impact of different text representation techniques on the accuracy of deep learning models. Also, we investigated the importance of hand-crafted features associated with the dataset in the final results. The results have shown that using truth seeker dataset show potential to help social media platforms in detecting fake news. on the other hand, using deep contextualized text representation produced more accurate results compared to word2vec and TF-IDF techniques. The impact of hand-crafted features on the final performance of deep learning models is often negligible, and it is suggested to be excluded from the final models. © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2024.
引用
收藏
页码:389 / 404
页数:15
相关论文
共 30 条
  • [1] Chandrakantha L., Learning anova concepts using simulation, Proceedings of the 2014 Zone 1 Conference of the American Society for Engineering Education, pp. 1-5, (2014)
  • [2] The largest social media ground-truth dataset for real/fake content: Truthseeker. IEEE Transactions on Computational Social Systems., (2023)
  • [3] di Tollo G., Andria J., Filograsso G., The predictive power of social media sentiment: Evidence from cryptocurrencies and stock markets using nlp and stochastic anns, Mathematics, 11, 16, (2023)
  • [4] Gamal D., Alfonse M., El-Horbaty E.-S.M., Salem A.-B.M., Analysis of machine learning algorithms for opinion mining in different domains, Machine Learning and Knowledge Extraction, 1, 1, pp. 224-234, (2019)
  • [5] Ganegedara T., Natural Language Processing with Tensorflow: The Definitive NLP Book to Implement The Most Sought-After Machine Learning Models and Tasks, (2022)
  • [6] Overview of the transformer-based models for NLP tasks, In 2020 15Th Conference on Computer Science and Information Systems (Fedcsis), pp. 179-183, (2020)
  • [7] Guo H., Li X., Zhang L., Liu J., Chen W., Label-aware text representation for multi-label text classification, In 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2021, pp. 7728-7732, (2021)
  • [8] Helmstetter S., Paulheim H., Weakly supervised learning for fake news detection on twitter, In 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 274-277, (2018)
  • [9] Helmstetter S., Paulheim H., Collecting a large scale dataset for classifying fake news tweets using weak supervision, Future Internet, 13, 5, (2021)
  • [10] Hisham M., Hasan R., Hussain S., An innovative approach for fake news detection using machine learning, Sir Syed University Research Journal of Engineering & Technology, 13, 1, pp. 115-124, (2023)