NileULex: A Phrase and Word Level Sentiment Lexicon for Egyptian and Modern Standard Arabic

被引:0
作者
El-Beltagy, Samhaa R. [1 ]
机构
[1] Nile Univ, Juhayna Sq, Giza, Egypt
来源
LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION | 2016年
关键词
Arabic sentiment analysis; Arabic sentiment lexicons; Arabic idioms; Arabic opinion mining;
D O I
暂无
中图分类号
H [语言、文字];
学科分类号
05 ;
摘要
This paper presents NileULex, which is an Arabic sentiment lexicon containing close to six thousands Arabic words and compound phrases. Forty five percent of the terms and expressions in the lexicon are Egyptian or colloquial while fifty five percent are Modern Standard Arabic. The development of the presented lexicon has taken place over the past two years. While the collection of many of the terms included in the lexicon was done automatically, the actual addition of any term was done manually. One of the important criterions for adding terms to the lexicon, was that they be as unambiguous as possible. The result is a lexicon with a much higher quality than any translated variant or automatically constructed one. To demonstrate that a lexicon such as this can directly impact the task of sentiment analysis, a very basic machine learning based sentiment analyser that uses unigrams, bigrams, and lexicon based features was applied on two different Twitter datasets. The obtained results were compared to a baseline system that only uses unigrams and bigrams. The same lexicon based features were also generated using a publicly available translation of a popular sentiment lexicon. The experiments show that usage of the developed lexicon improves the results over both the baseline and the publicly available lexicon.
引用
收藏
页码:2900 / 2905
页数:6
相关论文
共 20 条
[1]  
[Anonymous], 2010, P 7 INT C LANG RES E
[2]  
[Anonymous], 2012, Proceedings of the 6th International Global Word-Net Conference, Matsue, Japan
[3]  
Baccianella S., 2010, LREC 10, V10, P2200
[4]  
Badaro G., 2014, A large scale Arabic sentiment lexicon for Arabic opinion mining, P165
[5]  
Bhattaram S, 2005, 2005 INTERNATIONAL CONFERENCE ON INTEGRATION OF KNOWLEDGE INTENSIVE MULTI-AGENT SYSTEMS, P347
[6]  
El-Beltagy S.R., 2011, ACM T SPEECH LANGUAG, V7, P2
[7]  
El-Beltagy Samhaa R., 2016, TECHNICAL REPORT
[8]  
El-Beltagy Samhaa R, 2013, P 9 INT C INN INF TE
[9]  
El-Bletagy Samhaa R., 2016, P SEMEVAL 2016 SAN D
[10]  
ElSahar Hady, 2014, LECT NOTES COMPUTER, P8403