Improving hate speech detection using Cross-Lingual Learning

被引：8

作者：

Firmino, Anderson Almeida ^{[1
]}

Baptista, Claudio de Souza ^{[1
]}

de Paiva, Anselmo Cardoso ^{[2
]}

机构：

[1] Univ Fed Campina Grande, Rua Aprigio Veloso 882, Campina Grande, PB, Brazil

[2] Univ Fed Maranhao, Ave Portugueses 1966, Sao Luis, MA, Brazil

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 235卷

关键词：

Hate speech detection; Natural language processing; Social media; Cross-Lingual Learning; Deep learning;

D O I：

10.1016/j.eswa.2023.121115

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%).

引用

页数：13

共 58 条

[1] Deep Learning for Hate Speech Detection in Tweets
Badjatiya, Pinkesh
Gupta, Shashank
Gupta, Manish
Varma, Vasudeva
[J]. WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 759 - 760
[2] Bassignana E., 2018, CEUR Workshop Proceedings, V2253
[3] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?
Bender, Emily M.
Gebru, Timnit
McMillan-Major, Angelina
Shmitchell, Shmargaret
[J]. PROCEEDINGS OF THE 2021 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, FACCT 2021, 2021, : 610 - 623
[4] Bhaskaran J, 2019, GENDER BIAS IN NATURAL LANGUAGE PROCESSING (GEBNLP 2019), P62
[5] Bigoulaeva I., 2021, P 1 WORKSHOP LANGUAG, P15
[6] Bosco C., 2018, CEUR WORKSHOP P
[7] Automatic Classification of Abusive Language and Personal Attacks in Various Forms of Online Communication
Bourgonje, Peter
Moreno-Schneider, Julian
Srivastava, Ankit
Rehm, Georg
[J]. LANGUAGE TECHNOLOGIES FOR THE CHALLENGES OF THE DIGITAL AGE, GSCL 2017, 2018, 10713 : 180 - 191
[8] Us and them: identifying cyber hate on Twitter across multiple protected characteristics
Burnap, Pete
Williams, Matthew L.
[J]. EPJ DATA SCIENCE, 2016, 5
[9] Chung YL, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2819
[10] Unsupervised Cross-lingual Representation Learning for Speech Recognition
Conneau, Alexis
Baevski, Alexei
Collobert, Ronan
Mohamed, Abdelrahman
Auli, Michael
[J]. INTERSPEECH 2021, 2021, : 2426 - 2430

← 1 2 3 4 5 6 →