FA-Net: fused attention-based network for Hindi English code-mixed offensive text classification

被引:6
作者
Mundra, Shikha [1 ,2 ]
Mittal, Namita [1 ]
机构
[1] Malaviya Natl Inst Technol MNIT, Jaipur, Rajasthan, India
[2] Manipal Univ Jaipur, Jaipur, Rajasthan, India
关键词
Embedding; Fusion; Comprehensive representation; Hindi English code-mixed; Offensive;
D O I
10.1007/s13278-022-00929-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Widespread usage of social media platforms like Twitter, Facebook, and YouTube allows sharing of opinions and suggestions across countries. On the contrary, these platforms are often misused to disseminate hate speech and offensive content. Moreover, in a multilingual society such as India, many users resort to code-mixing while typing on social media. Thus, we have focused on Hindi English (Hi-En) Code-Mixed hate speech and offensive text classification. Recently, numerous approaches have emerged, and most of these approaches use CNN and LSTM in a stacked manner to extract local and sequential semantic features. However, these arrangements diminish the comprehensive effect of local and sequential features. In addition, deep framework suffers from issue of vanising gradient. Therefore, in our work, we have proposed, local and sequential knowledge aware Fused Attention-based Network (FA-Net), which introduces a fusion of attention mechanism of collective and mutual learning between local and sequential features. The proposed network (FA-Net) is lower in depth more in breadth in comparison to the existing architectures. It has three building blocks: Code Mixed Hybrid Embedding, Locally Driven Sequential Attention-2 (LDSA-2), Locally Driven Sequential Attention-3 (LDSA-3). CMHE is developed using customized Hi-En code mixed data, aiming the network to initialize with relevant weights. LDSA-2 and LDSA-3 equip the model to build a comprehensive representation having past, future, and local contextual knowledge w.r.t any location in the sentence. Extensive experimentation on two benchmark datasets shows that FA-Net has outperformed other state of the art.
引用
收藏
页数:14
相关论文
共 48 条
[1]  
Abadi M, 2016, Large-scale machine learning on heterogeneous systems
[2]   Oriented stochastic loss descent algorithm to train very deep multi-layer neural networks without vanishing gradients [J].
Abuqaddom, Inas ;
Mahafzah, Basel A. ;
Faris, Hossam .
KNOWLEDGE-BASED SYSTEMS, 2021, 230
[3]  
[Anonymous], 1995, ONE SPEAKER 2 LANGUA
[4]  
[Anonymous], 2015, P 3 INT C LEARN REPR
[5]  
[Anonymous], 2014, P 1 WORKSHOP COMPUTA
[6]  
[Anonymous], 2016, Report on Violence Against Women (VAW) Survey 2015
[7]   Deep Learning for Hate Speech Detection in Tweets [J].
Badjatiya, Pinkesh ;
Gupta, Shashank ;
Gupta, Manish ;
Varma, Vasudeva .
WWW'17 COMPANION: PROCEEDINGS OF THE 26TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB, 2017, :759-760
[8]  
Bhat IA, 2015, IIIT H SYSTEM SUBMIS, DOI [10.1145/2824864.2824872, DOI 10.1145/2824864.2824872]
[9]  
Bhattacharya S., 2020, P 2 WORKSH TROLL AGG, P158
[10]  
Bohra Aditya, 2018, P 2 WORKSHOP COMPUTA, P36, DOI [10.18653/v1/W18-1105, DOI 10.18653/V1/W18-1105]