Text Data Augmentation Techniques for Fake News Detection in the Romanian Language

被引:3
作者
Bucos, Marian [1 ]
Tucudean, Georgiana [1 ]
机构
[1] Politehn Univ Timisoara, Commun Dept, DataLab, 2 Vasile Parvan, Timisoara 300223, Romania
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 13期
关键词
fake news detection; data augmentation; back translation; easy data augmentation; Romanian data set; natural language processing; machine learning;
D O I
10.3390/app13137389
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
This paper aims to investigate the use of a Romanian data source, different classifiers, and text data augmentation techniques to implement a fake news detection system. The paper focusses on text data augmentation techniques to improve the efficiency of fake news detection tasks. This study provides two approaches for fake news detection based on content and context features found in the Factual.ro data set. For this purpose, we implemented two data augmentation techniques, Back Translation (BT) and Easy Data Augmentation (EDA), to improve the performance of the models. The results indicate that the implementation of the BT and EDA techniques successfully improved the performance of the classifiers used in our study. The results of our content-based approach show that an Extra Trees Classifier model is the most effective, whether data augmentation is used or not, as it produced the highest accuracy, precision, F1 score, and Kappa. The Random Forest Classifier with BT yielded the best results of the context-based experiment overall, with the highest accuracy, recall, F1 score, and Kappa. Furthermore, we found that BT and EDA led to an increase in the AUC scores of all models in both content-based and context-based data sets.
引用
收藏
页数:20
相关论文
共 45 条
[1]   Data augmentation in natural language processing: a novel text generation approach for long and short text classifiers [J].
Bayer, Markus ;
Kaufhold, Marc-Andre ;
Buchhold, Bjorn ;
Keller, Marcel ;
Dallmeyer, Joerg ;
Reuter, Christian .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (01) :135-150
[2]  
Beddiar D. R., 2021, Online Social Networks and Media, V24, DOI DOI 10.1016/J.OSNEM.2021.100153
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Busioc Costin, 2022, Ludic, Co-design and Tools Supporting Smart Learning Ecosystems and Smart Education: Proceedings of the 6th International Conference on Smart Learning Ecosystems and Regional Development. Smart Innovation, Systems and Technologies (249), P201, DOI 10.1007/978-981-16-3930-2_16
[5]   Automatic Fake News Detection for Romanian Online News [J].
Buzea, Marius Cristian ;
Trausan-Matu, Stefan ;
Rebedea, Traian .
INFORMATION, 2022, 13 (03)
[6]  
Calin I., 2022, JMR, V15, P29, DOI [10.24193/jmr.42.2, DOI 10.24193/JMR.42.2]
[7]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[8]   Text Data Augmentation for the Korean Language [J].
Dang Thanh Vu ;
Yu, Gwanghyun ;
Lee, Chilwoo ;
Kim, Jinyoung .
APPLIED SCIENCES-BASEL, 2022, 12 (07)
[9]  
Edunov S, 2018, Arxiv, DOI [arXiv:1808.09381, DOI 10.48550/ARXIV.1808.09381]
[10]  
Fadaee M., 2018, BACK TRANSLATION SAM