Challenges of Hate Speech Detection in Social Media: Data Scarcity, and Leveraging External Resources

被引:0
|
作者
Kovács G. [1 ]
Alonso P. [1 ]
Saini R. [1 ]
机构
[1] Luleå University of Technology, Aurorum 1, Luleå
关键词
BERT; Deep language processing; Hate speech; Transfer learning; Vocabulary augmentation;
D O I
10.1007/s42979-021-00457-3
中图分类号
学科分类号
摘要
The detection of hate speech in social media is a crucial task. The uncontrolled spread of hate has the potential to gravely damage our society, and severely harm marginalized people or groups. A major arena for spreading hate speech online is social media. This significantly contributes to the difficulty of automatic detection, as social media posts include paralinguistic signals (e.g. emoticons, and hashtags), and their linguistic content contains plenty of poorly written text. Another difficulty is presented by the context-dependent nature of the task, and the lack of consensus on what constitutes as hate speech, which makes the task difficult even for humans. This makes the task of creating large labeled corpora difficult, and resource consuming. The problem posed by ungrammatical text has been largely mitigated by the recent emergence of deep neural network (DNN) architectures that have the capacity to efficiently learn various features. For this reason, we proposed a deep natural language processing (NLP) model—combining convolutional and recurrent layers—for the automatic detection of hate speech in social media data. We have applied our model on the HASOC2019 corpus, and attained a macro F1 score of 0.63 in hate speech detection on the test set of HASOC. The capacity of DNNs for efficient learning, however, also means an increased risk of overfitting. Particularly, with limited training data available (as was the case for HASOC). For this reason, we investigated different methods for expanding resources used. We have explored various opportunities, such as leveraging unlabeled data, similarly labeled corpora, as well as the use of novel models. Our results showed that by doing so, it was possible to significantly increase the classification score attained. © 2021, The Author(s).
引用
收藏
相关论文
共 50 条
  • [31] ViTHSD: exploiting hatred by targets for hate speech detection on Vietnamese social media texts
    Vo, Cuong Nhat
    Huynh, Khanh Bao
    Luu, Son T.
    Do, Trong-Hop
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2025, 8 (02):
  • [32] Tracking Hate in Social Media: Evaluation, Challenges and Approaches
    Modha S.
    Mandl T.
    Majumder P.
    Patel D.
    SN Computer Science, 2020, 1 (2)
  • [33] A transfer learning approach for detecting offensive and hate speech on social media platforms
    Priyadarshini, Ishaani
    Sahu, Sandipan
    Kumar, Raghvendra
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 82 (18) : 27473 - 27499
  • [34] A transfer learning approach for detecting offensive and hate speech on social media platforms
    Ishaani Priyadarshini
    Sandipan Sahu
    Raghvendra Kumar
    Multimedia Tools and Applications, 2023, 82 : 27473 - 27499
  • [35] Hate Speech Detection in Hindi-English Code-Mixed Social Media Text
    Santosh, T. Y. S. S.
    Aravind, K. V. S.
    PROCEEDINGS OF THE 6TH ACM IKDD CODS AND 24TH COMAD, 2019, : 310 - 313
  • [36] Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques
    Mohapatra, Sudhir Kumar
    Prasad, Srinivas
    Bebarta, Dwiti Krishna
    Das, Tapan Kumar
    Srinivasan, Kathiravan
    Hu, Yuh-Chung
    APPLIED SCIENCES-BASEL, 2021, 11 (18):
  • [37] A survey of explainable AI techniques for detection of fake news and hate speech on social media platforms
    Gongane, Vaishali U.
    Munot, Mousami V.
    Anuse, Alwin D.
    JOURNAL OF COMPUTATIONAL SOCIAL SCIENCE, 2024, 7 (01): : 587 - 623
  • [38] ABMM: Arabic BERT-Mini Model for Hate-Speech Detection on Social Media
    Almaliki, Malik
    Almars, Abdulqader M.
    Gad, Ibrahim
    Atlam, El-Sayed
    ELECTRONICS, 2023, 12 (04)
  • [39] Homophobic and Hate Speech Detection Using Multilingual-BERT Mode on Turkish Social Media
    Karayigit, Habibe
    Akdagli, Ali
    Aci, Cikdem Inan
    INFORMATION TECHNOLOGY AND CONTROL, 2022, 51 (02): : 356 - 375
  • [40] Racism, Hate Speech, and Social Media: A Systematic Review and Critique
    Matamoros-Fernandez, Ariadna
    Farkas, Johan
    TELEVISION & NEW MEDIA, 2021, 22 (02) : 205 - 224