Bangla hate speech detection on social media using attention-based recurrent neural network

被引:41
作者
Das, Amit Kumar [1 ]
Al Asif, Abdullah [1 ]
Paul, Anik [1 ]
Hossain, Md Nur [1 ]
机构
[1] East West Univ, Comp Sci & Engn CSE, Dhaka, Bangladesh
关键词
RNN; attention mechanism; LSTM; GRU; Bangla text classification; Bangla hate speech detection;
D O I
10.1515/jisys-2020-0060
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Hate speech has spread more rapidly through the daily use of technology and, most notably, by sharing your opinions or feelings on social media in a negative aspect. Although numerous works have been carried out in detecting hate speeches in English, German, and other languages, very few works have been carried out in the context of the Bengali language. In contrast, millions of people communicate on social media in Bengali. The few existing works that have been carried out need improvements in both accuracy and interpretability. This article proposed encoder-decoder-based machine learning model, a popular tool in NLP, to classify user's Bengali comments from Facebook pages. A dataset of 7,425 Bengali comments, consisting of seven distinct categories of hate speeches, was used to train and evaluate our model. For extracting and encoding local features from the comments, 1D convolutional layers were used. Finally, the attention mechanism, LSTM, and GRU-based decoders have been used for predicting hate speech categories. Among the three encoder-decoder algorithms, attention-based decoder obtained the best accuracy (77%).
引用
收藏
页码:578 / 591
页数:14
相关论文
共 38 条
[1]  
[Anonymous], 2018, LIST EMOTICONS WIKIP
[2]   Hate Speech Detection on Indonesian Long Text Documents Using Machine Learning Approach [J].
Aulia, Nofa ;
Budi, Indra .
ICCAI '19 - PROCEEDINGS OF THE 2019 5TH INTERNATIONAL CONFERENCE ON COMPUTING AND ARTIFICIAL INTELLIGENCE, 2019, :164-169
[3]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[4]  
Biswas E, 2019, 2019 7TH INTERNATIONAL CONFERENCE ON SMART COMPUTING & COMMUNICATIONS (ICSCC), P84
[5]   A Novel Design of CRC-concatenated Polar Codes [J].
Chaki, Prakash ;
Kamiya, Norifumi .
ICC 2019 - 2019 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2019,
[6]   The Bag of Communities: Identifying Abusive Behavior Online with Preexisting Internet Data [J].
Chandrasekharan, Eshwar ;
Samory, Mattia ;
Srinivasan, Anirudh ;
Gilbert, Eric .
PROCEEDINGS OF THE 2017 ACM SIGCHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS (CHI'17), 2017, :3175-3187
[7]  
Das AK, 2019, 2019 IEEE 4TH INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS 2019), P36, DOI [10.1109/CCOMS.2019.8821655, 10.1109/ccoms.2019.8821655]
[8]  
de Gibert O, 2018, HATE SPEECH DATASET
[9]  
Drovo MD, 2019, 2019 7TH INTERNATIONAL CONFERENCE ON SMART COMPUTING & COMMUNICATIONS (ICSCC), P18
[10]  
Duggan Maeve, 2017, Online Harassment 2017