An approach to the use of word embeddings in an opinion classification task

被引:58
作者
Enriquez, Fernando [1 ]
Troyano, Jose A. [1 ]
Lopez-Solaz, Tomas [1 ]
机构
[1] Univ Seville, Dept Languages & Comp Syst, ETS Ingn Informat, Av Reina Mercedes S-N, E-41012 Seville, Spain
关键词
Document classification; Opinion classification; Word embedding; Bag of words; WORD2VEC;
D O I
10.1016/j.eswa.2016.09.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper we show how a vector-based word representation obtained via WORD2VEC can help to improve the results of a document classifier based on bags of words. Both models allow obtaining numeric representations from texts, but they do it very differently. The bag of words model can represent documents by means of widely dispersed vectors in which the indices are words or groups of words. WORD2VEC generates word level representations building vectors that are much more compact, where indices implicitly contain information about the context of word occurrences. Bags of words are very effective for document classification and in our experiments no representation using only WORD2vEC vectors is able to improve their results. However, this does not mean that the information provided by WORD2VEC is not useful for the classification task. When this information is used in combination with the bags of words, the results are improved, showing its complementarity and its contribution to the task. We have also performed cross-domain experiments in which WORD2VEC has shown much more stable behavior than bag of words models. (C) 2016 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1 / 6
页数:6
相关论文
共 21 条
[1]  
[Anonymous], 2008, ACL HLT
[2]   Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus [J].
Bollegala, Danushka ;
Weir, David ;
Carroll, John .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (08) :1719-1731
[3]  
Boratto L., 2016, KNOWLEDGE BASED SYST
[4]  
Bordes A, 2013, P 26 INT C NEURAL IN, P2787
[5]   Hierarchical Neural Language Models for Joint Representation of Streaming Documents and their Content [J].
Djuric, Nemanja ;
Wu, Hao ;
Radosavljevic, Vladan ;
Grbovic, Mihajlo ;
Bhamidipati, Narayan .
PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW 2015), 2015, :248-255
[6]  
Franco-Salvador M., 2015, KNOWLEDGE BASED SYST
[7]   Context- and Content-aware Embeddings for Query Rewriting in Sponsored Search [J].
Grbovic, Mihajlo ;
Djuric, Nemanja ;
Radosavljevic, Vladan ;
Silvestri, Fabrizio ;
Bhamidipati, Narayan .
SIGIR 2015: PROCEEDINGS OF THE 38TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2015, :383-392
[8]  
Kiros R., 2014, Advances in neural information processing systems, P2348
[9]  
Kiros R, 2014, PR MACH LEARN RES, V32, P595
[10]   Inferring Networks of Substitutable and Complementary Products [J].
McAuley, Julian ;
Pandey, Rahul ;
Leskovec, Jure .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :785-794