A study of the performance of embedding methods for Arabic short-text sentiment analysis using deep learning approaches

被引:22
作者
Alwehaibi, Ali [1 ]
Bikdash, Marwan [1 ]
Albogmi, Mohammad [2 ]
Roy, Kaushik [3 ]
机构
[1] North Carolina A&T State Univ, Dept Computat Data Sci & Engn, Greensboro, NC 27411 USA
[2] Taif Univ, Dept Arab Language, Taif 26571, Saudi Arabia
[3] North Carolina A&T State Univ, Dept Comp Sci, Greensboro, NC 27411 USA
关键词
Arabic short -text; Sentiment analysis; Deep learning; Embedding; Ensemble;
D O I
10.1016/j.jksuci.2021.07.011
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Sentiment analysis aims to classify a text according to sentimental polarities of people's opinions, such as positive, negative, or neutral. While most of the studies focus on eliciting features from English text, the research on Arabic is limited due to the morphological and grammatical complexity of Arabic language. In this paper, we proposed an optimized sentiment classification for dialectal Arabic short text at the doc-ument level using deep learning (DL). The contributions of this paper are in three areas. First, we extracted semantic features for Arabic short text at the word level and character level. Second, we used three DL topologies for classification models: a long short-term memory recurrent neural network (LSTM); a convolutional neural network (CNN); and an ensemble model combining both models' advan-tages to improve the prediction performance. Third, we used a hyperparameter tuning estimation method to optimize the neural network performance. We trained and tested our proposed models on a dataset that consists of Modern Standard Arabic and dialectal Arabic corpus collected from Twitter. The results showed significant improvement in Arabic text classification in term of classification accuracy that ranges between 88% and 69.7%. The ensemble model scored the highest accuracy of 96.7% on the test set.(c) 2021 The Authors. Published by Elsevier B.V. on behalf of King Saud University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
引用
收藏
页码:6140 / 6149
页数:10
相关论文
共 28 条
[1]  
Abu Farha I, 2019, MAZAJAK ONLINE ARABI, P192, DOI [10.18653/v1/w19-4621, DOI 10.18653/V1/W19-4621]
[2]  
Al-Anzi FS, 2017, J KING SAUD UNIV-COM, V29, P189, DOI 10.1016/j.jksuci.2016.04.001
[3]   A comprehensive survey of arabic sentiment analysis [J].
Al-Ayyoub, Mahmoud ;
Khamaiseh, Abed Allah ;
Jararweh, Yaser ;
Al-Kabi, Mohammed N. .
INFORMATION PROCESSING & MANAGEMENT, 2019, 56 (02) :320-342
[4]   AraSenTi-Tweet: A Corpus for Arabic Sentiment Analysis of Saudi Tweets [J].
Al-Twairesh, Nora ;
Al-Khalifa, Hend ;
Al-Salman, AbdulMalik ;
Al-Ohali, Yousef .
ARABIC COMPUTATIONAL LINGUISTICS (ACLING 2017), 2017, 117 :63-72
[5]   A Combined CNN and LSTM Model for Arabic Sentiment Analysis [J].
Alayba, Abdulaziz M. ;
Palade, Vasile ;
England, Matthew ;
Iqbal, Rahat .
MACHINE LEARNING AND KNOWLEDGE EXTRACTION, CD-MAKE 2018, 2018, 11015 :179-191
[6]  
Altowayan AA, 2016, 2016 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), P3820, DOI 10.1109/BigData.2016.7841054
[7]   Comparison of Pre-trained Word Vectors for Arabic Text Classification using Deep Learning Approach [J].
Alwehaibi, Ali ;
Roy, Kaushik .
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, :1471-1474
[8]  
[Anonymous], 2015, P 2015 C EMP METH NA, DOI [10.18653/v1/D15-1167, DOI 10.18653/V1/D15-1167]
[9]  
Bojanowski P, 2017, Arxiv, DOI arXiv:1607.04606
[10]  
Chiu J. P. C., arXiv