Deep contextualized text representation and learning for fake news detection

被引：66

作者：

Samadi, Mohammadreza ^{[1
]}

Mousavian, Maryam ^{[1
]}

Momtazi, Saeedeh ^{[1
]}

机构：

[1] Amirkabir Univ Technol, Comp Engn Dept, Tehran, Iran

来源：

INFORMATION PROCESSING & MANAGEMENT | 2021年 / 58卷 / 06期

关键词：

Fake news detection; Deep neural network; Contextualized text representation;

D O I：

10.1016/j.ipm.2021.102723

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In recent years, due to the widespread use of social media and broadcasting agencies around the world, people are extremely exposed to being affected by false information and fake news, all of which have negative impacts on both collective thoughts and governments' policies. In recent years, the great success of pre-trained models for embedding contextual information from texts motivates researchers to utilize these embeddings in different natural language processing tasks. However, in a complex task like fake news detection, it is not determined which contextualized embedding can assist the classifier with more valuable features. Due to the lack of a comparative study about utilizing different contextualized pre-trained models besides distinct neural classifiers, we aim to dive into a comparative study about using different classifiers and embedding models. In this paper, we propose three classifiers with different pretrained models for embedding input news articles. We connect Single-Layer Perceptron (SLP), Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN) after the embedding layer which consists of novel pre-trained models such as BERT, RoBERTa, GPT2, and Funnel Transformer in order to benefit from deep contextualized representation provided by those models as well as deep neural classifications. We evaluate our proposed models on three wellknown fake news datasets: LIAR (Wang, 2017), ISOT (Ahmed et al., 2017), and COVID-19 Patwa et al. (2020). The results on these three datasets show the superiority of our proposed models for fake news detection compared to the state-of-the-art models. The results show 7% and 0.1% improvements in classification accuracy compared to the proposed model by Goldani et al. (2021) on LIAR and ISOT, respectively. We also achieved 1% improvement compared to the proposed model by Shifath et al. (2021) on the COVID-19 dataset.

引用

页数：13

共 41 条

[1] Detection of Online Fake News Using N-Gram Analysis and Machine Learning Techniques [J].

Ahmed, Hadeer ;

Traore, Issa ;

Saad, Sherif .

INTELLIGENT, SECURE, AND DEPENDABLE SYSTEMS IN DISTRIBUTED AND CLOUD ENVIRONMENTS (ISDDC 2017), 2017, 10618 :127-138

[2] "Bend the truth": Benchmark dataset for fake news detection in Urdu language and its evaluation [J].

Amjad, Maaz ;

Sidorov, Grigori ;

Zhila, Alisa ;

Gomez-Adorno, Helena ;

Voronkov, Ilia ;

Gelbukh, Alexander .

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 39 (02) :2457-2469

[3]

[Anonymous], 2016, BuzzFeed News3 November

[4]

[Anonymous], 2017, P 8 INT JOINT C NAT

[5]

Antoun Wissam, 2020, 2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT), P519, DOI 10.1109/ICIoT48696.2020.9089487

[6]

Bojanowski P., 2017, Transactions of the association for computational linguistics, V5, P135, DOI DOI 10.1162/TACL_A_00051

[7]

Chollet F., 2015, Keras

[8]

Clark Kevin, 2020, ELECTRA: Pretraining text encoders as discriminators rather than generators, DOI [DOI 10.48550/ARXIV.2003.10555, 10.48550/arXiv.2003.10555]

[9]

Dai Z., 2020, ABS200603236 CORR

[10]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

← 1 2 3 4 5 →