Adaptive ensemble techniques leveraging BERT based models for multilingual hate speech detection in Korean and english

被引：0

作者：

Yoo, Seohyun ^{[1
]}

Jeon, Eunbae ^{[1
]}

Hyeon, Joonseo ^{[1
]}

Cho, Jaehyuk ^{[1
]}

机构：

[1] Jeonbuk Natl Univ, Dept Software Engn, Jeonju, South Korea

来源：

SCIENTIFIC REPORTS | 2025年 / 15卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

BERT-based models; Multilingual detection; Hate speech; Ensemble learning; Parallel Model Fusion (PMF);

D O I：

10.1038/s41598-025-88960-y

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Online hate speech has become a major social problem owing to the rapid growth of Internet communities. Relying on anonymity, people use hateful or abusive language for groups who are different from them. As these terms vary by region and are reflected in local languages, it is important to build robust hate speech detection models for each local language. We propose an ensemble of several Bidirectional Encoder Representations from transformers (BERT)-based models to enhance English and Korean hate speech detection. Parallel Model Fusion (PMF) requires the results of BERT-based models and a final estimator called meta-learner. During each cross-validation, validation and testing results were used to train and test the PMF data. PMF test data are calculated using Majority Voting Integration or Weighted Probabilistic Averaging. Popular machine learning algorithms such as Random Forest, Logistic Regression, Gaussian Na & iuml;ve Bayes, and Support Vector Machine are employed as meta-learners for PMF. The proposed model outperformed previous studies and the single-model approach in English and Korean, with accuracies of 85% and 89%, respectively, for each dataset. This study demonstrates improved automatic hate speech detection and encourage not only studies on English hate speech detection but also further work on non-English hate speech detection.

引用

页数：20

共 42 条

[1] Accelerating automatic hate speech detection using parallelized ensemble learning models [J].

Agarwal, Shivang ;

Sonawane, Ankur ;

Chowdary, C. Ravindranath .

EXPERT SYSTEMS WITH APPLICATIONS, 2023, 230

[2] Hate speech detection on Twitter using transfer learning [J].

Ali, Raza ;

Farooq, Umar ;

Arshad, Umair ;

Shahzad, Waseem ;

Beg, Mirza Omer .

COMPUTER SPEECH AND LANGUAGE, 2022, 74

[3]

[Anonymous], 2024, Meta *Community Standards Enforcement Report Meta Platforms, Inc.

[4]

Arora R., 2022, J. Comput. Linguist., DOI [10.48550/arXiv.2101.03207, DOI 10.48550/ARXIV.2101.03207]

[5] Deep Learning for Hate Speech Detection in Tweets [J].

Badjatiya, Pinkesh ;

Gupta, Shashank ;

Gupta, Manish ;

Varma, Vasudeva .

WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :759-760

[6] A Multilingual Evaluation for Online Hate Speech Detection [J].

Corazza, Michele ;

Menini, Stefano ;

Cabrio, Elena ;

Tonelli, Sara ;

Villata, Serena .

ACM TRANSACTIONS ON INTERNET TECHNOLOGY, 2020, 20 (02)

[7]

Davidson T, 2017, Proceedings of the International AAAI Conference on Web and Social Media, V11, P512, DOI [10.1609/icwsm.v11i1.14955, 10.1609/icwsm.v11i1.14955, DOI 10.1609/ICWSM.V11I1.14955]

[8]

Deng Jiawen, 2022, P 2022 C EMP METH NA, P11580, DOI [10.18653/v1/2022.emnlp-main.796, DOI 10.18653/V1/2022.EMNLP-MAIN.796]

[9]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[10] Improving hate speech detection using Cross-Lingual Learning [J].

Firmino, Anderson Almeida ;

Baptista, Claudio de Souza ;

de Paiva, Anselmo Cardoso .

EXPERT SYSTEMS WITH APPLICATIONS, 2024, 235

← 1 2 3 4 5 →