AraXLNet: pre-trained language model for sentiment analysis of Arabic

被引：0

作者：

Alhanouf Alduailej

Abdulrahman Alothaim

机构：

[1] King Saud University,Department of Information Systems, College of Computer and Information Sciences

来源：

Journal of Big Data | / 9卷

关键词：

Sentiment analysis; Language models; NLP; XLNet; AraXLNet; Text mining;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.

引用

共 50 条

[1] AraXLNet: pre-trained language model for sentiment analysis of Arabic
Alduailej, Alhanouf
Alothaim, Abdulrahman
JOURNAL OF BIG DATA, 2022, 9 (01)
[2] Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Shon, Suwon
Brusco, Pablo
Pan, Jing
Han, Kyu J.
Watanabe, Shinji
INTERSPEECH 2021, 2021, : 3420 - 3424
[3] A Comparative Study of Pre-trained Word Embeddings for Arabic Sentiment Analysis
Zouidine, Mohamed
Khalil, Mohammed
2022 IEEE 46TH ANNUAL COMPUTERS, SOFTWARE, AND APPLICATIONS CONFERENCE (COMPSAC 2022), 2022, : 1243 - 1248
[4] Incorporating emoji sentiment information into a pre-trained language model for Chinese and English sentiment analysis
Huang, Jiaming
Li, Xianyong
Li, Qizhi
Du, Yajun
Fan, Yongquan
Chen, Xiaoliang
Huang, Dong
Wang, Shumin
Li, Xianyong
INTELLIGENT DATA ANALYSIS, 2024, 28 (06) : 1601 - 1625
[5] Aspect Based Sentiment Analysis by Pre-trained Language Representations
Liang Tianxin
Yang Xiaoping
Zhou Xibo
Wang Bingqian
2019 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING (ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM 2019), 2019, : 1262 - 1265
[6] TwitterBERT: Framework for Twitter Sentiment Analysis Based on Pre-trained Language Model Representations
Azzouza, Noureddine
Akli-Astouati, Karima
Ibrahim, Roliana
EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 428 - 437
[7] Pre-Trained Language Model Ensemble for Arabic Fake News Detection
Al-Zahrani, Lama
Al-Yahya, Maha
MATHEMATICS, 2024, 12 (18)
[8] Comparing Pre-Trained Language Model for Arabic Hate Speech Detection
Daouadi, Kheir Eddine
Boualleg, Yaakoub
Guehairia, Oussama
COMPUTACION Y SISTEMAS, 2024, 28 (02): : 681 - 693
[9] Enhancing Turkish Sentiment Analysis Using Pre-Trained Language Models
Koksal, Omer
29TH IEEE CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS (SIU 2021), 2021,
[10] Incorporating Dynamic Semantics into Pre-Trained Language Model for Aspect-based Sentiment Analysis
Zhang, Kai
Zhang, Kun
Zhang, Mengdi
Zhao, Hongke
Liu, Qi
Wu, Wei
Chen, Enhong
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), 2022, : 3599 - 3610

← 1 2 3 4 5 →