Improving hate speech detection using Cross-Lingual Learning

被引：8

作者：

Firmino, Anderson Almeida ^{[1
]}

Baptista, Claudio de Souza ^{[1
]}

de Paiva, Anselmo Cardoso ^{[2
]}

机构：

[1] Univ Fed Campina Grande, Rua Aprigio Veloso 882, Campina Grande, PB, Brazil

[2] Univ Fed Maranhao, Ave Portugueses 1966, Sao Luis, MA, Brazil

来源：

EXPERT SYSTEMS WITH APPLICATIONS | 2024年 / 235卷

关键词：

Hate speech detection; Natural language processing; Social media; Cross-Lingual Learning; Deep learning;

D O I：

10.1016/j.eswa.2023.121115

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%).

引用

页数：13

共 58 条

[21] Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter
Frenda, Simona
Ghanem, Bilal
Montes-y-Gomez, Manuel
Rosso, Paolo
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4743 - 4752
[22] Grave E, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3483
[23] Hartmann Nathan S., 2017, P 11 BRAZILIAN S INF, P122
[24] The Problem of Identifying Misogynist Language on Twitter (and other online social spaces)
Hewitt, Sarah
Tiropanis, T.
Bokhove, C.
[J]. PROCEEDINGS OF THE 2016 ACM WEB SCIENCE CONFERENCE (WEBSCI'16), 2016, : 333 - 335
[25] DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language
Karim, Md Rezaul
Dey, Sumon Kanti
Islam, Tanhim
Sarker, Sagor
Menon, Mehadi Hasan
Hossain, Kabir
Hossain, Md Azam
Decker, Stefan
[J]. 2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
[26] Kemp S., 2021, About us
[27] Kottasova I., 2017, About us
[28] Comparison of non-survey techniques for constructing regional input-output tables
Lampiris, Georgios
Karelakis, Christos
Loizou, Efstratios
[J]. ANNALS OF OPERATIONS RESEARCH, 2020, 294 (1-2) : 225 - 266
[29] Lample G., 2018, ICLR
[30] Lima C., 2019, P 15 DAT WORKSH, P61, DOI [10.5753/erbd.2019.8479, DOI 10.5753/ERBD.2019.8479]

← 1 2 3 4 5 6 →