Abusive Content Detection in Arabic Tweets Using Multi-Task Learning and Transformer-Based Models

被引:4
作者
Alrashidi, Bedour [1 ,2 ]
Jamal, Amani [1 ]
Alkhathlan, Ali [1 ]
机构
[1] King Abdulaziz Univ, Fac Comp & Informat Technol, Dept Comp Sci, Jeddah 21589, Saudi Arabia
[2] Univ Hail, Coll Comp Sci & Engn, Dept Informat & Comp Sci, Hail 55436, Saudi Arabia
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 10期
关键词
abusive content; dialectal Arabic (DA); NLP; DL; multitask learning;
D O I
10.3390/app13105825
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Different social media platforms have become increasingly popular in the Arab world in recent years. The increasing use of social media, however, has also led to the emergence of a new challenge in the form of abusive content, including hate speech, offensive language, and abusive language. Existing research work focuses on automatic abusive content detection as a binary classification problem. In addition, the existing research work on the automatic detection task surrounding abusive Arabic content fails to tackle the dialect-specific phenomenon. Consequently, this has led to two important issues in the automatic abusive Arabic content detection task. In this study, we used a multi-aspect annotation schema to tackle the automatic abusive content detection problem in Arabic countries, based on the multi-class classification task and the dialectal Arabic (DA)-specific phenomenon. More precisely, the multi-aspect annotation schema includes five attributes: directness, hostility, target, group, and annotator. We specifically developed a framework to automatically detecting abusive content on Twitter using natural language processing (NLP) techniques. The developed framework used different models of machine learning (ML), deep learning (DL), and pretrained Arabic language models (LMs) using the multi-aspect annotation dataset. In addition, to investigate the impact of the other approaches, such as multi-task learning (MTL), we developed four MTL models built on top of a pretrained DA language model (called MARBERT) and trained on the multi-aspect annotation dataset. Our MTL models and pretrained Arabic LMs enhanced the performance compared to the existing DL model mentioned in the literature.
引用
收藏
页数:18
相关论文
共 55 条
[1]  
Abdelali A., 2021, arXiv
[2]  
Abdelali A, 2020, Arxiv, DOI arXiv:2005.06557
[3]  
Abdul-Mageed M., 2021, ACL IJCNLP 2021 59 A, V1, P7088, DOI DOI 10.18653/V1
[4]   Detection of hate speech in Arabic tweets using deep learning [J].
Al-Hassan, Areej ;
Al-Dossari, Hmood .
MULTIMEDIA SYSTEMS, 2022, 28 (06) :1963-1974
[5]  
Al-Khalifa H., 2022, P 5 WORKSHOP OPEN SO, P20
[6]  
Albadi N, 2018, 2018 IEEE/ACM INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL NETWORKS ANALYSIS AND MINING (ASONAM), P69, DOI 10.1109/ASONAM.2018.8508247
[7]   Arabic Offensive and Hate Speech Detection Using a Cross-Corpora Multi-Task Learning Model [J].
Aldjanabi, Wassen ;
Dahou, Abdelghani ;
Al-qaness, Mohammed A. A. ;
Abd Elaziz, Mohamed ;
Helmi, Ahmed Mohamed ;
Damasevicius, Robertas .
INFORMATICS-BASEL, 2021, 8 (04)
[8]  
AlKhamissi B., 2022, ARXIV
[9]   A Deep Learning Approach for Automatic Hate Speech Detection in the Saudi Twittersphere [J].
Alshalan, Raghad ;
Al-Khalifa, Hend .
APPLIED SCIENCES-BASEL, 2020, 10 (23) :1-16
[10]  
Antoun W, 2021, Arxiv, DOI arXiv:2003.00104