A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora

被引:2
作者
de Oliveira, Aillkeen Bezerra [1 ]
Baptista, Claudio de Souza [1 ]
Firmino, Anderson Almeida [1 ]
de Paiva, Anselmo Cardoso [2 ]
机构
[1] Univ Fed Campina Grande, Campina Grande, Paraiba, Brazil
[2] Univ Fed Maranhao, Sao Luis, Maranhao, Brazil
来源
39TH ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, SAC 2024 | 2024年
关键词
Hate Speech; Large Language Model; Cross-Lingual Learning; Machine Learning; Natural Language Processing;
D O I
10.1145/3605098.3635964
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In this era of unprecedented digital connectivity and interactions, the issue of hate speech has become a focal point in societal discussions. The rise of digital communication platforms has fundamentally transformed how hate speech spreads. Online social media and messaging apps have rapidly disseminated hate speech, exacerbated by the internet's anonymity. Computational technology has emerged as a valuable tool for identifying and mitigating hate speech on social media. In this work, we employed five distinct corpora representing the English, Italian, Filipino, German, and Turkish languages. We propose employing a Large Language Model (GPT-3) enhanced with Cross-Lingual Learning to improve hate speech detection in English and Italian. Our investigation employs a strategy, namely JL/CL+, which combines two strategies: Joint Learning (JL) and Cascade Learning (CL). Even using data with lexical disparities, our findings demonstrate substantial success, yielding an F1-score of 96.58% for English and 92.05% for Italian languages.
引用
收藏
页码:1461 / 1468
页数:8
相关论文
共 33 条
  • [11] Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter
    Frenda, Simona
    Ghanem, Bilal
    Montes-y-Gomez, Manuel
    Rosso, Paolo
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4743 - 4752
  • [12] Goutte C, 2005, LECT NOTES COMPUT SC, V3408, P345
  • [13] Grimminger L., 2021, P 11 WORKSH COMP APP, P171
  • [14] The Problem of Identifying Misogynist Language on Twitter (and other online social spaces)
    Hewitt, Sarah
    Tiropanis, T.
    Bokhove, C.
    [J]. PROCEEDINGS OF THE 2016 ACM WEB SCIENCE CONFERENCE (WEBSCI'16), 2016, : 333 - 335
  • [15] Lample Guillaume, 2018, INT C LEARN REPR ICL
  • [16] Early detection of cyberbullying on social media networks
    Lopez-Vizcaino, Manuel F.
    Novoa, Francisco J.
    Carneiro, Victor
    Cacheda, Fidel
    [J]. FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 118 : 219 - 229
  • [17] A Deep Learning Framework for the Detection of Malay Hate Speech
    Maity, Krishanu
    Bhattacharya, Shaubhik
    Saha, Sriparna
    Seera, Manjeevan
    [J]. IEEE ACCESS, 2023, 11 : 79542 - 79552
  • [18] Spread of Hate Speech in Online Social Media
    Mathew, Binny
    Dutt, Ritam
    Goyal, Pawan
    Mukherjee, Animesh
    [J]. PROCEEDINGS OF THE 11TH ACM CONFERENCE ON WEB SCIENCE (WEBSCI'19), 2019, : 173 - 182
  • [19] Comparing pre-trained language models for Spanish hate speech detection
    Miriam Plaza-del-Arco, Flor
    Dolores Molina-Gonzalez, M.
    Alfonso Urena-Lopez, L.
    Teresa Martin-Valdivia, M.
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [20] Cyber-aggression, Cyberbullying, and Cyber-grooming: A Survey and Research Challenges
    Mladenovic, Miljana
    Osmjanski, Vera
    Stankovic, Stasa Vujicic
    [J]. ACM COMPUTING SURVEYS, 2021, 54 (01)