Constructing ensembles for hate speech detection

被引:0
作者
Kucukkaya, Izzet Emre [1 ]
Toraman, Cagri [2 ]
机构
[1] Tech Univ Munich, Sch Computat Informat & Technol, Munich, Germany
[2] Middle East Tech Univ, Comp Engn Dept, Ankara, Turkiye
来源
NATURAL LANGUAGE PROCESSING | 2024年
关键词
Hate speech detection; ensemble learning; text classification; online social networks; offensive content;
D O I
10.1017/nlp.2024.44
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hate speech against individuals and groups with certain demographics is a major issue in social media. Supervised models for hate speech detection mostly utilize labeled data collections to understand textual semantics. However, hate speech detection is a complex task that involves several aspects, including topic and writing style. The complexity of hate speech can be represented by an ensemble of models learned from different aspects of data. Moreover, ensemble members or base models can be modified to give attention to particular aspects of hate speech. In this study, we extract different aspects of hate speech to construct ensembles, thereby improving the performance of hate speech detection by ensemble learning. We conduct detailed experiments on five datasets in multiple languages to generalize our observations. The experimental results, supported by statistical significance tests, show that the performance of hate speech detection can be improved by capturing multiple aspects of hate speech. Our ensemble construction approach outperforms the baselines in terms of the F1 score of the Hate class in 80% of the cases, and the Offensive class in 75% of the cases. We also compare our approach with state-of-the-art ensemble methods from shared tasks and find that our highest-performing method can improve the performance of the Hate class in two out of three datasets. We further discuss our approach and experimental results in terms of ensemble parameters and writing style among ensemble members.
引用
收藏
页数:26
相关论文
共 83 条
[1]   Combating hate speech using an adaptive ensemble learning model with a case study on COVID-19 [J].
Agarwal, Shivang ;
Chowdary, C. Ravindranath .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 185
[2]  
Aharoni R., 2020, P 58 ANN M ASS COMP, P7747, DOI [10.18653/v1/2020.acl-main.692, DOI 10.18653/V1/2020.ACL-MAIN.692, DOI 10.18653/V1/2020.ACL]
[3]   Deep Learning Ensembles for Hate Speech Detection [J].
Alsafari, Safa ;
Sadaoui, Samira ;
Mouhoub, Malek .
2020 IEEE 32ND INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI), 2020, :526-531
[4]  
[Anonymous], 2014, P 2014 C EMP METH NA, DOI DOI 10.3115/V1/D14-1181
[5]  
[Anonymous], 2017, P 21 C COMPUTATIONAL, DOI DOI 10.1007/978-3-319-54588-2_2
[6]  
Anusha M.D., 2020, WORKING NOTES FIRE 2, V2826, P253
[7]   Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models [J].
Bailly, Alexandre ;
Blanc, Corentin ;
Francis, Elie ;
Guillotin, Thierry ;
Jamal, Fadi ;
Wakim, Bechara ;
Roy, Pascal .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2022, 213
[8]  
Barbieri F, 2020, FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, P1644
[9]  
Basile Valerio, 2019, P 13 INT WORKSHOP SE, P54
[10]  
Bigoulaeva I., 2021, P 1 WORKSH LANG TECH, P15