Arabic sarcasm detection: An enhanced fine-tuned language model approach

被引:11
作者
Galal, Mohamed A. [1 ,5 ]
Yousef, Ahmed Hassan [2 ,3 ]
Zayed, Hala H. [2 ,4 ]
Medhat, Walaa [1 ,4 ]
机构
[1] Nile Univ, Informat Technol & Comp Sci Sch, Giza, Egypt
[2] Egypt Univ Informat, Fac Engn, Cairo, Egypt
[3] Ain Shams Univ, Fac Engn, Cairo, Egypt
[4] Benha Univ, Fac Comp & Artificial Intelligence, Banha, Egypt
[5] ITWorx, Cairo, Egypt
关键词
Irony detection; Sarcasm detection; Arabic tweets; deep learning; Language models; Transformer -based models; Natural language processing; IRONY DETECTION; BERT;
D O I
10.1016/j.asej.2024.102736
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Sarcasm is a complex linguistic phenomenon involving humor, criticism, or phrases that convey the opposite meaning, mask true feelings, and play pivotal roles in various aspects of communication. Therefore, identifying sarcasm is essential for sentiment analysis, social media monitoring, and customer service, as it enables a better understanding of public sentiment. Moreover, social media has become a primary platform for people to express their feelings and opinions and provide feedback to businesses and service providers. Misinterpreting sarcasm in customer feedback can lead to incorrect responses and actions. However, accurately detecting sarcasm is challenging because it depends on context, cultural factors, and inherent ambiguity. Despite the plenty of research and resources in Machine Learning (ML) for detecting sarcasm in English, including Deep Learning (DL) techniques, there is still a shortage of research in sarcasm detection in Arabic, particularly in DL methodologies and available sarcastic datasets. This paper constructed a new Arabic sarcastic corpus and fine-tuned three pretrained Arabic transformer-based Language Models (LM) for Arabic sarcasm detection. We also proposed a hybrid DL approach for sarcasm detection that combines static and contextualized representations using pretrained LM, such as Word2Vec word embeddings and Bidirectional Encoder Representations from Transformers (BERT) models pretrained on Arabic resources. The proposed enhanced hybrid deep learning approach outperforms state-of-the-art models by 8% on a shared benchmark dataset and achieves a 5% improvement in F1score on another.
引用
收藏
页数:14
相关论文
共 74 条
[1]  
Abdelali A., 2021, Pre -training BERT on Arabic tweets: Practical considerations
[2]  
Abdelkader El Mahdaouy AEMKEASIB, 2022, ArxivOrg
[3]  
Abdul-Mageed M, 2020, AraNet: A deep learning toolkit for arabic social media, P11
[4]  
Abdul-Mageed M., 2021, P 59 ANN M ASS COMP, P7088
[5]  
Abdul-Mageed M, 2020, ArxivOrg
[6]  
Abu Farha I., 2020, P 4 WORKSHOP OPEN SO, P32
[7]  
Abu Farha I, 2022, PROCEEDINGS OF THE 16TH INTERNATIONAL WORKSHOP ON SEMANTIC EVALUATION, SEMEVAL-2022, P802
[8]  
Abu Farha I, 2019, FOURTH ARABIC NATURAL LANGUAGE PROCESSING WORKSHOP (WANLP 2019), P192
[9]   Self-Deprecating Sarcasm Detection: An Amalgamation of Rule-Based and Machine Learning Approach [J].
Abulaish, Muhammad ;
Kamal, Ashraf .
2018 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2018), 2018, :574-579
[10]  
Al-Ghadhban D, 2017, 2017 INTERNATIONAL CONFERENCE ON ENGINEERING & MIS (ICEMIS)