WELFake: Word Embedding Over Linguistic Features for Fake News Detection

被引:103
|
作者
Verma, Pawan Kumar [1 ,2 ]
Agrawal, Prateek [2 ,3 ]
Amorim, Ivone [4 ,5 ]
Prodan, Radu [3 ]
机构
[1] GLA Univ, Dept Comp Engn & Applicat, Mathura 281406, India
[2] Lovely Profess Univ, Sch Comp Sci & Engn, Phagwara 144411, India
[3] Univ Klagenfurt, Inst Informat Technol, A-9020 Klagenfurt, Austria
[4] MOG Technol, P-4470605 Moreira, Portugal
[5] Univ Porto, CMUP Math Res Ctr, P-4099002 Porto, Portugal
基金
欧盟地平线“2020”;
关键词
Social networking (online); Linguistics; Data models; Bit error rate; Feature extraction; Training; Vegetation; Bidirectional encoder representations from transformer (BERT); convolutional neural network (CNN); fake news; linguistic feature; machine learning (ML); text classification; voting classifier; word embedding (WE); DECEPTION; CUES;
D O I
10.1109/TCSS.2021.3068519
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Social media is a popular medium for the dissemination of real-time news all over the world. Easy and quick information proliferation is one of the reasons for its popularity. An extensive number of users with different age groups, gender, and societal beliefs are engaged in social media websites. Despite these favorable aspects, a significant disadvantage comes in the form of fake news, as people usually read and share information without caring about its genuineness. Therefore, it is imperative to research methods for the authentication of news. To address this issue, this article proposes a two-phase benchmark model named WELFake based on word embedding (WE) over linguistic features for fake news detection using machine learning classification. The first phase preprocesses the data set and validates the veracity of news content by using linguistic features. The second phase merges the linguistic feature sets with WE and applies voting classification. To validate its approach, this article also carefully designs a novel WELFake data set with approximately 72 000 articles, which incorporates different data sets to generate an unbiased classification output. Experimental results show that the WELFake model categorizes the news in real and fake with a 96.73% which improves the overall accuracy by 1.31% compared to bidirectional encoder representations from transformer (BERT) and 4.25% compared to convolutional neural network (CNN) models. Our frequency-based and focused analyzing writing patterns model outperforms predictive-based related works implemented using the Word2vec WE method by up to 1.73%.
引用
收藏
页码:881 / 893
页数:13
相关论文
共 50 条
  • [2] Word embedding and classification methods and their effects on fake news detection
    Hauschild, Jessica
    Eskridge, Kent
    MACHINE LEARNING WITH APPLICATIONS, 2024, 17
  • [3] Linguistic features based framework for automatic fake news detection
    Garg, Sonal
    Sharma, Dilip Kumar
    COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 172
  • [4] Enhancing Fake News Detection with Word Embedding: A Machine Learning and Deep Learning Approach
    Al-Tarawneh, Mutaz A. B.
    Al-irr, Omar
    Al-Maaitah, Khaled S.
    Kanj, Hassan
    Aly, Wael Hosny Fouad
    COMPUTERS, 2024, 13 (09)
  • [5] An empiric validation of linguistic features in machine learning models for fake news detection
    Puraivan, Eduardo
    Venegas, Rene
    Riquelme, Fabian
    DATA & KNOWLEDGE ENGINEERING, 2023, 147
  • [6] Fighting the Fake: A Forensic Linguistic Analysis to Fake News Detection
    Rui Sousa-Silva
    International Journal for the Semiotics of Law - Revue internationale de Sémiotique juridique, 2022, 35 : 2409 - 2433
  • [7] Fighting the Fake: A Forensic Linguistic Analysis to Fake News Detection
    Sousa-Silva, Rui
    INTERNATIONAL JOURNAL FOR THE SEMIOTICS OF LAW-REVUE INTERNATIONALE DE SEMIOTIQUE JURIDIQUE, 2022, 35 (06): : 2409 - 2433
  • [8] A Hybrid Approach for Fake News Detection in Twitter Based on User Features and Graph Embedding
    Hamdi, Tarek
    Slimi, Hamda
    Bounhas, Ibrahim
    Slimani, Yahya
    DISTRIBUTED COMPUTING AND INTERNET TECHNOLOGY (ICDCIT 2020), 2020, 11969 : 266 - 280
  • [9] A deep neural network approach for fake news detection using linguistic and psychological features
    Arunthavachelvan, Keshopan
    Raza, Shaina
    Ding, Chen
    USER MODELING AND USER-ADAPTED INTERACTION, 2024, 34 (04) : 1043 - 1070
  • [10] Linguistic Features and Bi-LSTM for Identification of Fake News
    Ali, Attar Ahmed
    Latif, Shahzad
    Ghauri, Sajjad A.
    Song, Oh-Young
    Abbasi, Aaqif Afzaal
    Malik, Arif Jamal
    ELECTRONICS, 2023, 12 (13)