Deep analysis of an Arabic sentiment classification system based on lexical resource expansion and custom approaches building

被引:0
作者
Ibtissam Touahri
Azzeddine Mazroui
机构
[1] University Mohammed First,Department of Computer Science, Faculty of Sciences
来源
International Journal of Speech Technology | 2021年 / 24卷
关键词
Sentiment analysis; Opinion mining; Arabic language; Lexicon; Corpus; Lemmatization;
D O I
暂无
中图分类号
学科分类号
摘要
Sentiment analysis aims to extract emotions from a broad set of data. This paper studies the impact of lexical resource enrichment on Arabic Sentiment Analysis. At first and as there is a lack of Arabic lexical resources in the field of sentiment analysis, we build new resources and use several lexicon construction methods. The first method is manual and it lies in extracting sentimental words from a selected dataset and the second is semi-automatic and based on translating an English lexicon into Arabic followed by a manual check. Both methods generate terms in word form. Besides the mentioned resources, the paper enriches an existing resource that contains terms related to four specific domains by creating its equivalent lemmatized version. Following various methods, we created lexicons with different morphologies to enrich the existing Arabic resources. Subsequently, these resources are used in developing a polarity classifier. The paper explains the followed steps to construct the different lexical resources, defines the pre-processing levels and gives statistics related to each lexicon. Then, we present the classification approaches we used to determine the polarity of the new data. In order to perform in depth analysis of the results in correspondence to the extracted features, we opt for the unsupervised and the supervised approaches that help to have a clear view on their internal architecture and process. The experiments are based on features alteration, besides opting for a feature selection approach in order to keep the most pertinent features and reduce the characteristic vector size. Moreover, we perform an in depth analysis of the characteristic vectors and corpus nature and we explain the main causes behind results improvement and degradation. The results of the tests carried out show the relevance of each component of the system.
引用
收藏
页码:109 / 126
页数:17
相关论文
共 51 条
[1]  
Abdulla NA(2014)Towards improving the lexicon-based approach for Arabic sentiment analysis International Journal of Information Technology and Web Engineering 9 55-71
[2]  
Ahmed NA(2018)Arabic senti-lexicon: Constructing publicly available language resources for Arabic sentiment analysis Journal of Information Science 44 345-362
[3]  
Shehab MA(2017)AROMA: A recursive deep learning model for opinion mining in Arabic as a low resource language ACM Transactions on Asian and Low-Resource Language Information Processing 16 1-20
[4]  
Al-Ayyoub M(2019)A hybrid approach for Arabic lemmatization International Journal of Speech Technology 22 563-573
[5]  
Al-Kabi MN(2017)AlKhalil Morpho Sys 2: A robust Arabic morpho-syntactic analyzer Journal of King Saud University-Computer and Information Sciences 29 141-146
[6]  
Al-rifai S(2014)A study of the effects of preprocessing strategies on sentiment analysis for Arabic text Journal of Information Science 40 501-513
[7]  
Al-Moslmi T(2015)Detecting sentiment embedded in Arabic social media: A lexicon-based approach Journal of Intelligent & Fuzzy System 29 107-117
[8]  
Albared M(2018)An annotated huge dataset for standard and colloquial Arabic reviews for subjective sentiment analysis Procedia Computer Science 142 182-189
[9]  
Al-Shabi A(2015)Sentiment analysis for modern standard Arabic and colloquial International Journal on Natural Language Computing 4 95-109
[10]  
Omar N(2017)SOUKHRIA: Towards an irony detection system for Arabic in social media Procedia Computer Science 117 161-168