Improving Hate Speech Detection: The Impact of Semantic Representations and Preprocessing Techniques

被引：0

作者：

Bolucu, Necva ^{[1
]}

Ozerdem, Aysegul ^{[1
]}

机构：

[1] CSIRO, DATA61, Sydney, Australia

来源：

2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU | 2023年

关键词：

social media; hate speech; semantic; API; preprocessing;

D O I：

10.1109/SIU59756.2023.10224051

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Social Media is one of the important tools that can be used to measure the pulse of a society. However, when hate speech targeting an individual or group is produced through this tool, this situation becomes a phenomenon that can lead to social problems. In this context, the detection of hate speech is crucial. In this study, which is proposed for the hate speech detection shared task at SIU 2023 NST, the importance of semantic representations obtained through the OpenAI API is investigated in order to detect hate speech effectively. As preprocessing steps, the normalization of the dataset, an emoji dictionary, and SMOTE technic for the problem of imbalanced dataset have been applied. To demonstrate the importance of this step for the problem, basic machine learning techniques, SVM and cosine similarity, are being utilized. The experimental results show that the semantic representations offer a successful solution to the problem with machine learning models. In particular, the solution of the preprocessing step applied for the imbalanced dataset has a great contribution to the problem.

引用

页数：4

共 19 条

[1] Deep Learning for Hate Speech Detection in Tweets
Badjatiya, Pinkesh
Gupta, Shashank
Gupta, Manish
Varma, Vasudeva
[J]. WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, : 759 - 760
[2] Beyhan F, 2022, LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P4177
[3] CHINCHOR N, 1992, FOURTH MESSAGE UNDERSTANDING CONFERENCE (MUC-4), P30
[4] Dagasan T., 2019, Master's thesis
[5] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[6] SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
Fernandez, Alberto
Garcia, Salvador
Herrera, Francisco
Chawla, Nitesh V.
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 863 - 905
[7] González-Carvajal S, 2021, Arxiv, DOI arXiv:2005.13012
[8] Support vector machines
Hearst, MA
[J]. IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1998, 13 (04): : 18 - 21
[9] Husunbeyi Z. M., 2020, Ph.D. dissertation
[10] Kamalloo E, 2023, Arxiv, DOI arXiv:2305.06300

← 1 2 →