Cross-Domain Sentiment Classification Using a Sentiment Sensitive Thesaurus

被引:128
作者
Bollegala, Danushka [1 ]
Weir, David [2 ]
Carroll, John [2 ]
机构
[1] Univ Tokyo, Dept Informat & Commun Engn, Grad Sch Informat Sci & Technol, Bunkyo Ku, Tokyo 1138656, Japan
[2] Univ Sussex, Dept Informat, Brighton BN1 9QJ, E Sussex, England
基金
英国工程与自然科学研究理事会;
关键词
Cross-domain sentiment classification; domain adaptation; thesauri creation;
D O I
10.1109/TKDE.2012.103
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatic classification of sentiment is important for numerous applications such as opinion mining, opinion summarization, contextual advertising, and market analysis. Typically, sentiment classification has been modeled as the problem of training a binary classifier using reviews annotated for positive or negative sentiment. However, sentiment is expressed differently in different domains, and annotating corpora for every possible domain of interest is costly. Applying a sentiment classifier trained using labeled data for a particular domain to classify sentiment of user reviews on a different domain often results in poor performance because words that occur in the train (source) domain might not appear in the test (target) domain. We propose a method to overcome this problem in cross-domain sentiment classification. First, we create a sentiment sensitive distributional thesaurus using labeled data for the source domains and unlabeled data for both source and target domains. Sentiment sensitivity is achieved in the thesaurus by incorporating document level sentiment labels in the context vectors used as the basis for measuring the distributional similarity between words. Next, we use the created thesaurus to expand feature vectors during train and test times in a binary classifier. The proposed method significantly outperforms numerous baselines and returns results that are comparable with previously proposed cross-domain sentiment classification methods on a benchmark data set containing Amazon user reviews for different types of products. We conduct an extensive empirical analysis of the proposed method on single-and multisource domain adaptation, unsupervised and supervised domain adaptation, and numerous similarity measures for creating the sentiment sensitive thesaurus. Moreover, our comparisons against the SentiWordNet, a lexical resource for word polarity, show that the created sentiment-sensitive thesaurus accurately captures words that express similar sentiments.
引用
收藏
页码:1719 / 1731
页数:13
相关论文
共 42 条
[1]  
Ando RK, 2005, J MACH LEARN RES, V6, P1817
[2]  
[Anonymous], P COLING ACL INT PRE
[3]  
[Anonymous], 2006, Proceedings of the Conference on Empirical Methods in Natural Language Processing
[4]  
[Anonymous], 2006, P C EMP METH NAT LAN
[5]  
[Anonymous], 2008, LEARNING BOUNDS DOMA
[6]  
[Anonymous], 2004, SIGMOD
[7]  
[Anonymous], P NIPS WORKSH TRANSF
[8]  
[Anonymous], P 18 ACM C INF KNOWL
[9]  
[Anonymous], TECHNICAL REPORT
[10]  
Ben-David S., 2006, P ADV NEURAL INFORM, VVolume 19