Tracking Hate in Social Media: Evaluation, Challenges and Approaches

被引:0
作者
Modha S. [1 ,3 ]
Mandl T. [2 ]
Majumder P. [1 ]
Patel D. [3 ]
机构
[1] DA-IICT, Gandhinagar
[2] University of Hildesheim, Hildesheim
[3] LDRP-ITR, Gandhinagar
关键词
Deep learning; Evaluation; Hate speech; Natural language processing; Text classification;
D O I
10.1007/s42979-020-0082-0
中图分类号
学科分类号
摘要
This paper presents online hate speech as a societal and computational challenge. Offensive content detection in social media is considered as a multilingual, multi-level, multi-class classification problem for three Indo-European languages. This research problem is offered to the community through the HASOC shared task. HASOC intends to stimulate research and development in hate speech recognition across different languages. Three datasets (in English, German, and Hindi) were developed from Twitter and Facebook, and made available. This paper describes the creation of the multilingual datasets and the annotation method. We will present the numerous approaches based on traditional classifiers, deep neural models, and transfer learning models, along with features used for the classification. Results show that the best classifier for the binary classification might not perform best in the multi-class classification, and the performance of the same classifier varies across the languages. Overall, transfer learning models such as BERT, and deep neural models based on LSTMs and CNNs perform similar but better than traditional classifiers such as SVM. We will conclude the discussion with a list of issues that needs to be addressed for future datasets. © 2020, Springer Nature Singapore Pte Ltd.
引用
收藏
相关论文
共 69 条
[1]  
Al-Hassan A., Al-Dossari H., Detection of hate speech in social networks: a survey on multilingual corpus, Comput Sci Inf Technol (CS & IT), 9, 2, (2019)
[2]  
Saroj A., Mundotiya R.K., Pal S., IRLab@ IITBHU at hasoc 2019: Traditional machine learning for hate speech and offensive content identification, Proceedings of the 11Th Annual Meeting of the Forum for Information Retrieval Evaluation, (2019)
[3]  
Baruah A., Barbhuiya F., Dey K., IIITG-ADBU at HASOC 2019: Automated hate speech and offensive content detection in english and code-mixed hindi text, Proceedings of the 11Th Annual Meeting of the Forum for Information Retrieval Evaluation, (2019)
[4]  
Bashar M.A., Nayak R., QutNocturnal@HASOC’19: CNN for hate speech and offensive content identification in hindi language, : Proceedings of the 11Th Annual Meeting of the Forum for Information Retrieval Evaluation., (2019)
[5]  
Basile V., Bosco C., Fersini E., Nozza D., Patti V., Pardo F.M.R., Rosso P., Sanguinetti M., Semeval-2019 task 5: Multilingual detection of hate speech against immigrants and women in twitter, : Proceedings of the 13Th International Workshop on Semantic Evaluation., pp. 54-63, (2019)
[6]  
Wang B., Yunxia Ding S.L., Zhou X., YNU Wb at HASOC 2019: Ordered neurons LSTM with attention for identifying hate speech and offensive language, Proceedings of the 11Th Annual Meeting of the Forum for Information Retrieval Evaluation, (2019)
[7]  
Burnap P., Williams M.L., Cyber hate speech on twitter: an application of machine classification and statistical modeling for policy and decision making, Policy Internet, 7, 2, pp. 223-242, (2015)
[8]  
Casavantes M., Lopez R., Gonzalez L.C., Montes-Y Gomez M., UACh-INAOE at HASOC 2019: Detecting aggressive tweets by incorporating authors’ traits as descriptors, Proceedings of the 11Th Annual Meeting of the Forum for Information Retrieval Evaluation, (2019)
[9]  
Conover M.D., Ratkiewicz J., Francisco M., Goncalves B., Menczer F., Flammini A., Political polarization on twitter, Fifth International AAAI Conference on Weblogs and Social Media., (2011)
[10]  
Dana Ruiter M.A.R., Klakow D., LSV-UdS at HASOC 2019: The problem of defining hate?, Proceedings of the 11Th Annual Meeting of the Forum for Information Retrieval Evaluation, (2019)