Improving hate speech detection using Cross-Lingual Learning

被引:8
作者
Firmino, Anderson Almeida [1 ]
Baptista, Claudio de Souza [1 ]
de Paiva, Anselmo Cardoso [2 ]
机构
[1] Univ Fed Campina Grande, Rua Aprigio Veloso 882, Campina Grande, PB, Brazil
[2] Univ Fed Maranhao, Ave Portugueses 1966, Sao Luis, MA, Brazil
关键词
Hate speech detection; Natural language processing; Social media; Cross-Lingual Learning; Deep learning;
D O I
10.1016/j.eswa.2023.121115
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The growth of social media worldwide has brought social benefits and challenges. One problem we highlight is the proliferation of hate speech on social media. We propose a novel method for detecting hate speech in texts using Cross-Lingual Learning. Our approach uses transfer learning from Pre-Trained Language Models (PTLM) with large corpora available to solve problems in languages with fewer resources for the specific task. The proposed methodology comprises four stages: corpora acquisition, the PTLM definition, training strategies, and evaluation. We carried out experiments using Pre-Trained Language Models in English, Italian, and Portuguese (BERT and XLM-R) to verify which best suited the proposed method. We used corpora in English (WH) and Italian (Evalita 2018) as the source language and the OffComBr-2 corpus in Portuguese (the target language). The results of the experiments showed that the proposed methodology is promising: for the OffComBr-2 corpus, the best state-of-the-art result was obtained (F1-measure = 92%).
引用
收藏
页数:13
相关论文
共 58 条
  • [21] Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter
    Frenda, Simona
    Ghanem, Bilal
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4743 - 4752
  • [22] Grave E, 2018, PROCEEDINGS OF THE ELEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION (LREC 2018), P3483
  • [23] Hartmann Nathan S., 2017, P 11 BRAZILIAN S INF, P122
  • [24] The Problem of Identifying Misogynist Language on Twitter (and other online social spaces)
    Hewitt, Sarah
    Tiropanis, T.
    Bokhove, C.
    [J]. PROCEEDINGS OF THE 2016 ACM WEB SCIENCE CONFERENCE (WEBSCI'16), 2016, : 333 - 335
  • [25] DeepHateExplainer: Explainable Hate Speech Detection in Under-resourced Bengali Language
    Karim, Md Rezaul
    Dey, Sumon Kanti
    Islam, Tanhim
    Sarker, Sagor
    Menon, Mehadi Hasan
    Hossain, Kabir
    Hossain, Md Azam
    Decker, Stefan
    [J]. 2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [26] Kemp S., 2021, About us
  • [27] Kottasova I., 2017, About us
  • [28] Comparison of non-survey techniques for constructing regional input-output tables
    Lampiris, Georgios
    Karelakis, Christos
    Loizou, Efstratios
    [J]. ANNALS OF OPERATIONS RESEARCH, 2020, 294 (1-2) : 225 - 266
  • [29] Lample G., 2018, ICLR
  • [30] Lima C., 2019, P 15 DAT WORKSH, P61, DOI [10.5753/erbd.2019.8479, DOI 10.5753/ERBD.2019.8479]