Multilingual Hate Speech Detection Using Semi-supervised Generative Adversarial Network

被引：1

作者：

Mnassri, Khouloud ^{[1
]}

Farahbakhsh, Reza ^{[1
]}

Crespi, Noel ^{[1
]}

机构：

[1] Inst Polytech Paris, Samovar, Telecom SudParis, F-91120 Palaiseau, France

来源：

COMPLEX NETWORKS & THEIR APPLICATIONS XII, VOL 4, COMPLEX NETWORKS 2023 | 2024年 / 1144卷

关键词：

Hate Speech; offensive language; semi-supervised; GAN; mBERT; multilingual; social media;

D O I：

10.1007/978-3-031-53503-1_16

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Online communication has overcome linguistic and cultural barriers, enabling global connection through social media platforms. However, linguistic variety introduced more challenges in tasks such as the detection of hate speech content. Although multiple NLP solutions were proposed using advanced machine learning techniques, data annotation scarcity is still a serious problem urging the need for employing semi-supervised approaches. This paper proposes an innovative solution-a multilingual Semi-Supervised model based on Generative Adversarial Networks (GAN) and mBERT models, namely SS-GAN-mBERT. We managed to detect hate speech in Indo-European languages (in English, German, and Hindi) using only 20% labeled data from the HASOC2019 dataset. Our approach excelled in multilingual, zero-shot cross-lingual, and monolingual paradigms, achieving, on average, a 9.23% F1 score boost and 5.75% accuracy increase over baseline mBERT model.

引用

页码：192 / 204

页数：13

共 27 条

[1] Semi-Supervised Self-Learning for Arabic Hate Speech Detection
Alsafari, Safa
Sadaoui, Samira
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2021, : 863 - 868
[2] [Anonymous], 2020, Social Media and Democracy: The State of the Field, Prospects for Reform. SSRC Anxieties of Democracy
[3] Cao R., 2020, PROC 28 INT C COMPUT, P6327, DOI DOI 10.18653/V1/2020.COLING-MAIN.557
[4] Croce D, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P2114
[5] A survey on text generation using generative adversarial networks
de Rosa, Gustavo H.
Papa, Joao P.
[J]. PATTERN RECOGNITION, 2021, 119
[6] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7] DSa Ashwin Geet, 2020, P 1 WORKSH INS NAG R, P54
[8] A Survey on Automatic Detection of Hate Speech in Text
Fortuna, Paula
Nunes, Sergio
[J]. ACM COMPUTING SURVEYS, 2018, 51 (04)
[9] Generative Adversarial Networks
Goodfellow, Ian
Pouget-Abadie, Jean
Mirza, Mehdi
Xu, Bing
Warde-Farley, David
Ozair, Sherjil
Courville, Aaron
Bengio, Yoshua
[J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
[10] Consumer sentiment analysis with aspect fusion and GAN-BERT aided adversarial learning
Jain, Praphula Kumar
Quamer, Waris
Pamula, Rajendra
[J]. EXPERT SYSTEMS, 2023, 40 (04)

← 1 2 3 →