Context-aware and expert data resources for Brazilian Portuguese hate speech detection

被引:0
|
作者
Vargas, Francielle [1 ,2 ]
Carvalho, Isabelle [1 ]
Pardo, Thiago A. S. [1 ]
Benevenuto, Fabricio [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, Sao Carlos, Brazil
[2] Univ Fed Minas Gerais, Comp Sci Dept, Belo Horizonte, Brazil
来源
NATURAL LANGUAGE PROCESSING | 2025年 / 31卷 / 02期
关键词
hate speech; Brazilian Portuguese; low-resource languages; RELIABILITY; PRAGMATICS;
D O I
10.1017/nlp.2024.18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper provides data resources for low-resource hate speech detection. Specifically, we introduce two different data resources: (i) the HateBR 2.0 corpus, which is composed of 7,000 comments extracted from Brazilian politicians' accounts on Instagram and manually annotated a binary class (offensive versus non-offensive) and hate speech targets. It consists of an updated version of the HateBR corpus, in which highly similar and one-word comments were replaced; and (ii) the multilingual offensive lexicon (MOL), which consists of 1,000 explicit and implicit terms and expressions annotated with context information. The lexicon also comprises native-speaker translations and its cultural adaptations in English, Spanish, French, German, and Turkish. Both corpus and lexicon were annotated by three different experts and achieved high inter-annotator agreement. Lastly, we implemented baseline experiments on the proposed data resources. Results demonstrate the reliability of data outperforming baseline dataset results in Portuguese, besides presenting promising results for hate speech detection in different languages.
引用
收藏
页码:435 / 456
页数:22
相关论文
共 22 条
  • [1] Context-Aware Deep Learning Model for Detection of Roman Urdu Hate Speech on Social Media Platform
    Bilal, Muhammad
    Khan, Atif
    Jan, Salman
    Musa, Shahrulniza
    IEEE ACCESS, 2022, 10 : 121133 - 121151
  • [2] HateBR: A Large Expert Annotated Corpus of Brazilian Instagram Comments for Offensive Language and Hate Speech Detection
    Vargas, Francielle
    Carvalho, Isabelle
    Goes, Fabiana
    Pardo, Thiago A. S.
    Benevenuto, Fabricio
    LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7174 - 7183
  • [3] Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources
    Kovács G.
    Alonso P.
    Saini R.
    SN Computer Science, 2021, 2 (2)
  • [4] Data Augmentation for Improving Explainability of Hate Speech Detection
    Gunjan Ansari
    Parmeet Kaur
    Chandni Saxena
    Arabian Journal for Science and Engineering, 2024, 49 : 3609 - 3621
  • [5] Data Augmentation for Improving Explainability of Hate Speech Detection
    Ansari, Gunjan
    Kaur, Parmeet
    Saxena, Chandni
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2024, 49 (03) : 3609 - 3621
  • [6] Leveraging Transfer Learning for Hate Speech Detection in Portuguese Social Media Posts
    Ramos, Gil
    Batista, Fernando
    Ribeiro, Ricardo
    Fialho, Pedro
    Moro, Sergio
    Fonseca, Antonio
    Guerra, Rita
    Carvalho, Paula
    Marques, Catarina
    Silva, Claudia
    IEEE ACCESS, 2024, 12 : 101374 - 101389
  • [7] Free Tools and Resources for HMM-Based Brazilian Portuguese Speech Synthesis
    Costa, Ericson
    Neto, Nelson
    ADVANCES IN ARTIFICIAL INTELLIGENCE - IBERAMIA 2018, 2018, 11238 : 367 - 379
  • [8] Hate Speech and Offensive Language Detection using an Emotion-aware Shared Encoder
    Mnassri, Khouloud
    Rajapaksha, Praboda
    Farahbakhsh, Reza
    Crespi, Noel
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 2852 - 2857
  • [9] Application of Data Augmentation Techniques for Hate Speech Detection with Deep Learning
    Venturott, Ligia Iunes
    Ciarelli, Patrick Marques
    PROGRESS IN ARTIFICIAL INTELLIGENCE (EPIA 2021), 2021, 12981 : 778 - 787
  • [10] Data expansion using back translation and paraphrasing for hate speech detection
    Beddiar D.R.
    Jahan M.S.
    Oussalah M.
    Online Social Networks and Media, 2021, 24