Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network

被引:1
作者
Mnassri, Khouloud [1 ]
Farahbakhsh, Reza [1 ]
Crespi, Noel [1 ]
机构
[1] Inst Polytech Paris, Samovar, Telecom SudParis, F-91120 Palaiseau, France
来源
COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 4, COMPLEX NETWORKS 2023 | 2024年 / 1144卷
关键词
Hate Speech; offensive language; semi-supervised; GAN; mBERT; multilingual; social media;
D O I
10.1007/978-3-031-53503-1_16
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Online communication has overcome linguistic and cultural barriers, enabling global connection through social media platforms. However, linguistic variety introduced more challenges in tasks such as the detection of hate speech content. Although multiple NLP solutions were proposed using advanced machine learning techniques, data annotation scarcity is still a serious problem urging the need for employing semi-supervised approaches. This paper proposes an innovative solution-a multilingual Semi-Supervised model based on Generative Adversarial Networks (GAN) and mBERT models, namely SS-GAN-mBERT. We managed to detect hate speech in Indo-European languages (in English, German, and Hindi) using only 20% labeled data from the HASOC2019 dataset. Our approach excelled in multilingual, zero-shot cross-lingual, and monolingual paradigms, achieving, on average, a 9.23% F1 score boost and 5.75% accuracy increase over baseline mBERT model.
引用
收藏
页码:192 / 204
页数:13
相关论文
共 27 条
  • [1] Semi-Supervised Self-Learning for Arabic Hate Speech Detection
    Alsafari, Safa
    Sadaoui, Samira
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 863 - 868
  • [2] [Anonymous], 2020, Social Media and Democracy: The State of the Field, Prospects for Reform. SSRC Anxieties of Democracy
  • [3] Cao R., 2020, PROC 28 INT C COMPUT, P6327, DOI DOI 10.18653/V1/2020.COLING-MAIN.557
  • [4] Croce D, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P2114
  • [5] A survey on text generation using generative adversarial networks
    de Rosa, Gustavo H.
    Papa, Joao P.
    [J]. PATTERN RECOGNITION, 2021, 119
  • [6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [7] DSa Ashwin Geet, 2020, P 1 WORKSH INS NAG R, P54
  • [8] A Survey on Automatic Detection of Hate Speech in Text
    Fortuna, Paula
    Nunes, Sergio
    [J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
  • [9] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
  • [10] Consumer sentiment analysis with aspect fusion and GAN-BERT aided adversarial learning
    Jain, Praphula Kumar
    Quamer, Waris
    Pamula, Rajendra
    [J]. EXPERT SYSTEMS, 2023, 40 (04)