Enhancing deep learning sentiment analysis with ensemble techniques in social applications

被引:339
作者
Araque, Oscar [1 ]
Corcuera-Platas, Ignacio [1 ]
Sanchez-Rada, J. Fernando [1 ]
Iglesias, Carlos A. [1 ]
机构
[1] Univ Politecn Madrid, Escuela Tecn Super Ingn Telecommun, Dept Ingn Sistemas Telemat, Ave Complutense 30, Madrid, Spain
基金
欧盟地平线“2020”;
关键词
Ensemble; Deep learning; Sentiment analysis; Machine learning; Natural language processing; POLARITY; CLASSIFICATION;
D O I
10.1016/j.eswa.2017.02.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Deep learning techniques for Sentiment Analysis have become very popular. They provide automatic feature extraction and both richer representation capabilities and better performance than traditional feature based techniques (i.e., surface methods). Traditional surface approaches are based on complex manually extracted features, and this extraction process is a fundamental question in feature driven methods. These long-established approaches can yield strong baselines, and their predictive capabilities can be used in conjunction with the arising deep learning methods. In this paper we seek to improve the performance of deep learning techniques integrating them with traditional surface approaches based on manually extracted features. The contributions of this paper are sixfold. First, we develop a deep learning based sentiment classifier using a word embeddings model and a linear machine learning algorithm. This classifier serves as a baseline to compare to subsequent results. Second, we propose two ensemble techniques which aggregate our baseline classifier with other surface classifiers widely used in Sentiment Analysis. Third, we also propose two models for combining both surface and deep features to merge information from several sources. Fourth, we introduce a taxonomy for classifying the different models found in the literature, as well as the ones we propose. Fifth, we conduct several experiments to compare the performance of these models with the deep learning baseline. For this, we use seven public datasets that were extracted from the microblogging and movie reviews domain. Finally, as a result, a statistical study confirms that the performance of these proposed models surpasses that of our original baseline on Fl-Score. (C) 2017 The Authors. Published by Elsevier Ltd.
引用
收藏
页码:236 / 246
页数:11
相关论文
共 72 条
[1]  
[Anonymous], CORR
[2]  
[Anonymous], TEXTBLOB DOCUMENTATI
[3]  
[Anonymous], 2013, P 2013 C N AM CHAPTE
[4]  
[Anonymous], 2014, P 8 INT WORKSH SEM E
[5]  
[Anonymous], SENTIMENT WSD GITHUB
[6]  
[Anonymous], 2014, INT C MACH LEARN
[7]  
[Anonymous], 2012, LREC
[8]  
[Anonymous], 2012, Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers-Volume 2, ACL '12
[9]  
[Anonymous], 2013, ICWSM
[10]  
[Anonymous], 2010, P 23 INT C COMP LING