Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources

被引:0
|
作者
Kovács G. [1 ]
Alonso P. [1 ]
Saini R. [1 ]
机构
[1] Luleå University of Technology, Aurorum 1, Luleå
关键词
BERT; Deep language processing; Hate speech; Transfer learning; Vocabulary augmentation;
D O I
10.1007/s42979-021-00457-3
中图分类号
学科分类号
摘要
The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate has the potential to gravely damage our society, and severely harm marginalized people or groups. A major arena for spreading hate speech online is social media. This significantly contributes to the difficulty of automatic detection, as social media posts include paralinguistic signals (e.g. emoticons, and hashtags), and their linguistic content contains plenty of poorly written text. Another difficulty is presented by the context-dependent nature of the task, and the lack of consensus on what constitutes as hate speech, which makes the task difficult even for humans. This makes the task of creating large labeled corpora difficult, and resource consuming. The problem posed by ungrammatical text has been largely mitigated by the recent emergence of deep neural network (DNN) architectures that have the capacity to efficiently learn various features. For this reason, we proposed a deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data. We have applied our model on the HASOC2019 corpus, and attained a macro F1 score of 0.63 in hate speech detection on the test set of HASOC. The capacity of DNNs for efficient learning, however, also means an increased risk of overfitting. Particularly, with limited training data available (as was the case for HASOC). For this reason, we investigated different methods for expanding resources used. We have explored various opportunities, such as leveraging unlabeled data, similarly labeled corpora, as well as the use of novel models. Our results showed that by doing so, it was possible to significantly increase the classification score attained. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [41] A Hybrid Deep BiLSTM-CNN for Hate Speech Detection in Multi-social media
    Kumar, Ashwini
    Kumar, Santosh
    Passi, Kalpdrum
    Mahanti, Aniket
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2024, 23 (08)
  • [42] Moral Values in Social Media for Disinformation and Hate Speech Analysis
    Brugnoli, Emanuele
    Gravino, Pietro
    Prevedello, Giulio
    VALUE ENGINEERING IN ARTIFICIAL INTELLIGENCE, VALE 2023, 2024, 14520 : 67 - 82
  • [44] Free vs hate speech on social media: the Indian perspective
    Alam, Iftikhar
    Raina, Roshan Lal
    Siddiqui, Faizia
    JOURNAL OF INFORMATION COMMUNICATION & ETHICS IN SOCIETY, 2016, 14 (04) : 350 - 363
  • [45] Detecting weak and strong Islamophobic hate speech on social media
    Vidgen, Bertie
    Yasseri, Taha
    JOURNAL OF INFORMATION TECHNOLOGY & POLITICS, 2020, 17 (01) : 66 - 78
  • [46] HHSD: Hindi Hate Speech Detection Leveraging Multi-Task Learning
    Kapil, Prashant
    Kumari, Gitanjali
    Ekbal, Asif
    Pal, Santanu
    Chatterjee, Arindam
    Vinutha, B. N.
    IEEE ACCESS, 2023, 11 : 101460 - 101473
  • [47] Dynamics of hate speech in social media: insights from Indonesia
    Margono, Hendro
    Saud, Muhammad
    Ashfaq, Asia
    GLOBAL KNOWLEDGE MEMORY AND COMMUNICATION, 2024,
  • [48] SIREN! Detecting Burmese Hate Speech Comments on Social Media
    Chit, Khin Me Me
    Shein, Yi Yi Chan Myae Win
    Yan, Wai
    Khine, Aye Hninn
    2022-14TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST 2022), 2022, : 119 - 124
  • [49] Perspectives of Canadian Youth on Islamophobic Hate Speech on Social Media
    Arshad-Ayaz, Adeela
    Naseem, Muhammad Ayaz
    Hizoui, Hedia
    Akram, Muhammad
    CANADIAN JOURNAL OF COMMUNICATION, 2024, 49 (04) : 586 - 611
  • [50] Data Augmentation for Improving Explainability of Hate Speech Detection
    Gunjan Ansari
    Parmeet Kaur
    Chandni Saxena
    Arabian Journal for Science and Engineering, 2024, 49 : 3609 - 3621