Context-Based Feature Technique for Sarcasm Identification in Benchmark Datasets Using Deep Learning and BERT Model

被引:45
作者
Eke, Christopher Ifeanyi [1 ,2 ]
Norman, Azah Anir [1 ]
Shuib, Liyana [1 ]
机构
[1] Univ Malaya, Fac Comp Sci & Informat Technol, Dept Informat Syst, Kuala Lumpur 50603, Malaysia
[2] Fed Univ, Fac Comp, Dept Comp Sci, PMB 046, Lafia, Nigeria
关键词
Feature extraction; Sentiment analysis; Deep learning; Context modeling; Semantics; Bit error rate; Social networking (online); Natural language processing; sarcasm identification; Bi-LSTM; GloVe embedding; BERT; SENTIMENT ANALYSIS; BIDIRECTIONAL LSTM; CLASSIFICATION;
D O I
10.1109/ACCESS.2021.3068323
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sarcasm is a complicated linguistic term commonly found in e-commerce and social media sites. Failure to identify sarcastic utterances in Natural Language Processing applications such as sentiment analysis and opinion mining will confuse classification algorithms and generate false results. Several studies on sarcasm detection have utilised different learning algorithms. However, most of these learning models have always focused on the contents of expression only, leaving the contextual information in isolation. As a result, they failed to capture the contextual information in the sarcastic expression. Secondly, many deep learning methods in NLP uses a word embedding learning algorithm as a standard approach for feature vector representation, which ignores the sentiment polarity of the words in the sarcastic expression. This study proposes a context-based feature technique for sarcasm Identification using the deep learning model, BERT model, and conventional machine learning to address the issues mentioned above. Two Twitter and Internet Argument Corpus, version two (IAC-v2) benchmark datasets were utilised for the classification using the three learning models. The first model uses embedding-based representation via deep learning model with bidirectional long short term memory (Bi-LSTM), a variant of Recurrent Neural Network (RNN), by applying Global Vector representation (GloVe) for the construction of word embedding and context learning. The second model is based on Transformer using a pre-trained Bidirectional Encoder representation and Transformer (BERT). In contrast, the third model is based on feature fusion that comprised BERT feature, sentiment related, syntactic, and GloVe embedding feature with conventional machine learning. The effectiveness of this technique is tested with various evaluation experiments. However, the technique's evaluation on two Twitter benchmark datasets attained 98.5% and 98.0% highest precision, respectively. The IAC-v2 dataset, on the other hand, achieved the highest precision of 81.2%, which shows the significance of the proposed technique over the baseline approaches for sarcasm analysis.
引用
收藏
页码:48501 / 48518
页数:18
相关论文
共 67 条
[1]   Context expansion approach for graph-based word sense disambiguation [J].
Abdalgader, Khaled ;
Al Shibli, Aysha .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
[2]   Leveraging Transitions of Emotions for Sarcasm Detection [J].
Agrawal, Ameeta ;
An, Aijun ;
Papagelis, Manos .
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, :1505-1508
[3]  
[Anonymous], 2017, ARXIV170400514
[4]  
[Anonymous], 2019, CISCO VISUAL NETWORK
[5]   Enhancing deep learning sentiment analysis with ensemble techniques in social applications [J].
Araque, Oscar ;
Corcuera-Platas, Ignacio ;
Sanchez-Rada, J. Fernando ;
Iglesias, Carlos A. .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 77 :236-246
[6]  
Berry Michael W, 2004, Comput. Rev., V45, P548
[7]   A Pattern-Based Approach for Sarcasm Detection on Twitter [J].
Bouazizi, Mondher ;
Otsuki , Tomoaki .
IEEE ACCESS, 2016, 4 :5477-5488
[8]   Affective Computing and Sentiment Analysis [J].
Cambria, Erik .
IEEE INTELLIGENT SYSTEMS, 2016, 31 (02) :102-107
[9]  
Carr C., 2019, P JOINT ACM IUI WORK
[10]  
Castro S, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P4619