AraXLNet: pre-trained language model for sentiment analysis of Arabic

被引:15
作者
Alduailej, Alhanouf [1 ]
Alothaim, Abdulrahman [1 ]
机构
[1] King Saud Univ, Dept Informat Syst, Coll Comp & Informat Sci, Riyadh 11451, Saudi Arabia
关键词
Sentiment analysis; Language models; NLP; XLNet; AraXLNet; Text mining; NEURAL-NETWORK;
D O I
10.1186/s40537-022-00625-z
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The Arabic language is a complex language with little resources; therefore, its limitations create a challenge to produce accurate text classification tasks such as sentiment analysis. The main goal of sentiment analysis is to determine the overall orientation of a given text in terms of whether it is positive, negative, or neutral. Recently, language models have shown great results in promoting the accuracy of text classification in English. The models are pre-trained on a large dataset and then fine-tuned on the downstream tasks. Particularly, XLNet has achieved state-of-the-art results for diverse natural language processing (NLP) tasks in English. In this paper, we hypothesize that such parallel success can be achieved in Arabic. The paper aims to support this hypothesis by producing the first XLNet-based language model in Arabic called AraXLNet, demonstrating its use in Arabic sentiment analysis in order to improve the prediction accuracy of such tasks. The results showed that the proposed model, AraXLNet, with Farasa segmenter achieved an accuracy results of 94.78%, 93.01%, and 85.77% in sentiment analysis task for Arabic using multiple benchmark datasets. This result outperformed AraBERT that obtained 84.65%, 92.13%, and 85.05% on the same datasets, respectively. The improved accuracy of the proposed model was evident using multiple benchmark datasets, thus offering promising advancement in the Arabic text classification tasks.
引用
收藏
页数:21
相关论文
共 56 条
[1]  
Abdelali A, 2016, HLT-NAACL Demos, P116
[2]  
Al-Twairesh N, 2018, ARXIV PREPRINT
[3]   AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets [J].
Al-Twairesh, Nora ;
Al-Khalifa, Hend ;
Al-Salman, AbdulMalik ;
Al-Ohali, Yousef .
ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 :63-72
[4]  
Alomari Khaled Mohammad, 2017, Advances in Artificial Intelligence: from Theory to Practice. 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE 2017. Proceedings: LNAI 10350, P602, DOI 10.1007/978-3-319-60042-0_66
[5]  
Aly M., 2013, Short Papers, P494, DOI DOI 10.13140/2.1.3960.5761
[6]  
Antoun W, 2020, ARXIV PREPRINT
[7]  
Arowolo M.O., 2022, HEALTHCARE INFORMATI, P193, DOI DOI 10.1007/978-3-030-72752-9_10
[8]   A Hybrid Heuristic Dimensionality Reduction Methods for Classifying Malaria Vector Gene Expression Data [J].
Arowolo, Micheal O. ;
Adebiyi, Marion Olubunmi ;
Adebiyi, Ayodele Ariyo ;
Okesola, Olatunji Julius .
IEEE ACCESS, 2020, 8 :182422-182430
[9]  
Arowolo MO, 2021, Walailak Journal of Science and Technology (WJST), V18, DOI [10.48048/wjst.2021.9849, 10.48048/wjst.2021.9849]
[10]   Artificial neural networks: fundamentals, computing, design, and application [J].
Basheer, IA ;
Hajmeer, M .
JOURNAL OF MICROBIOLOGICAL METHODS, 2000, 43 (01) :3-31