Text Data Augmentation Techniques for Fake News Detection in the Romanian Language

被引:3
作者
Bucos, Marian [1 ]
Tucudean, Georgiana [1 ]
机构
[1] Politehn Univ Timisoara, Commun Dept, DataLab, 2 Vasile Parvan, Timisoara 300223, Romania
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
关键词
fake news detection; data augmentation; back translation; easy data augmentation; Romanian data set; natural language processing; machine learning;
D O I
10.3390/app13137389
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper aims to investigate the use of a Romanian data source, different classifiers, and text data augmentation techniques to implement a fake news detection system. The paper focusses on text data augmentation techniques to improve the efficiency of fake news detection tasks. This study provides two approaches for fake news detection based on content and context features found in the Factual.ro data set. For this purpose, we implemented two data augmentation techniques, Back Translation (BT) and Easy Data Augmentation (EDA), to improve the performance of the models. The results indicate that the implementation of the BT and EDA techniques successfully improved the performance of the classifiers used in our study. The results of our content-based approach show that an Extra Trees Classifier model is the most effective, whether data augmentation is used or not, as it produced the highest accuracy, precision, F1 score, and Kappa. The Random Forest Classifier with BT yielded the best results of the context-based experiment overall, with the highest accuracy, recall, F1 score, and Kappa. Furthermore, we found that BT and EDA led to an increase in the AUC scores of all models in both content-based and context-based data sets.
引用
收藏
页数:20
相关论文
共 45 条
[11]   Extremely randomized trees [J].
Geurts, P ;
Ernst, D ;
Wehenkel, L .
MACHINE LEARNING, 2006, 63 (01) :3-42
[12]  
Ghinadya, 2020, P 2020 8 INT C INFOR, P1
[13]   The Role of Personality and Linguistic Patterns in Discriminating Between Fake News Spreaders and Fact Checkers [J].
Giachanou, Anastasia ;
Rissola, Esteban A. ;
Ghanem, Bilal ;
Crestani, Fabio ;
Rosso, Paolo .
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2020), 2020, 12089 :181-192
[14]  
Graca M., 2019, GEN BACK TRANSLATION
[15]   Transfer Learning and GRU-CRF Augmentation for Covid-19 Fake News Detection [J].
Karnyoto, Andrea ;
Sun, Chengjie ;
Liu, Bingquan ;
Wang, Xiaolong .
COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2022, 19 (02) :639-658
[16]   AugFake-BERT: Handling Imbalance through Augmentation of Fake News Using BERT to Enhance the Performance of Fake News Classification [J].
Keya, Ashfia Jannat ;
Wadud, Md Anwar Hussen ;
Mridha, M. F. ;
Alatiyyah, Mohammed ;
Hamid, Md Abdul .
APPLIED SCIENCES-BASEL, 2022, 12 (17)
[17]  
Kumar S., 2022, Glob Trans Proc, V3, P289, DOI [10.1016/j.gltp.2022.03.014, DOI 10.1016/J.GLTP.2022.03.014]
[18]  
Kuzmin Gleb, 2020, P 3 INT WORKSHOP RUM, P45
[19]   A survey on addressing high-class imbalance in big data [J].
Leevy J.L. ;
Khoshgoftaar T.M. ;
Bauder R.A. ;
Seliya N. .
Journal of Big Data, 2018, 5 (01)
[20]  
Ma J., 2020, Journal of Physics: Conference Series, V1651