Empirical study on imbalanced learning of Arabic sentiment polarity with neural word embedding

被引:7
作者
El-Alfy, El-Sayed M. [1 ]
Al-Azani, Sadam [1 ]
机构
[1] King Fahd Univ Petr & Minerals, Informat & Comp Sci Dept, Dhahran, Saudi Arabia
关键词
Social network; sentiment analysis; polarity detection; word embedding; machine learning; imbalanced dataset; Arabic tweets; CLASSIFICATION; SMOTE;
D O I
10.3233/JIFS-179703
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the proliferation of social media and mobile technology, huge amount of unstructured data is posted daily online. Consequently, sentiment analysis has gained increasing importance as a tool to understand the opinions of certain groups of people on contemporary political, cultural, social or commercial issues. Unlike western languages, the research on sentiment analysis for dialectical Arabic language is still in its early stages with several challenges to be addressed. The main goal of this study is twofold. First, it compares the performance of core machine learning algorithms for detecting the polarity in imbalanced Arabic tweet datasets using neural word embedding as a feature extractor rather than hand-crafted or traditional features. Second, it examines the impact of using various oversampling techniques to handle the highly-imbalanced nature of the sentiment data. Intensive empirical analysis of nine machine learning methods and six oversampling methods has been conducted and the results have been discussed in terms of a wide range of performance measures.
引用
收藏
页码:6211 / 6222
页数:12
相关论文
共 46 条
[41]  
Refaee E, 2014, LREC 2014 - NINTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, P2268
[42]   The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets [J].
Saito, Takaya ;
Rehmsmeier, Marc .
PLOS ONE, 2015, 10 (03)
[43]  
Salameh M., 2015, P 2015 C N AM CHAPT, P767
[44]   A review of natural language processing techniques for opinion mining systems [J].
Sun, Shiliang ;
Luo, Chen ;
Chen, Junyu .
INFORMATION FUSION, 2017, 36 :10-25
[45]  
Yu H, 2003, PROCEEDINGS OF THE 2003 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, P129
[46]   Machine Learning and Lexicon based Methods for Sentiment Classification: A Survey [J].
Zhang, Hailong ;
Gan, Wenyan ;
Jiang, Bo .
2014 11TH WEB INFORMATION SYSTEM AND APPLICATION CONFERENCE (WISA), 2014, :262-265