Domain Adaptation Using Domain Similarity- and Domain Complexity-based Instance Selection for Cross-domain Sentiment Analysis

被引:35
作者
Remus, Robert [1 ]
机构
[1] Univ Leipzig, Dept Comp Sci, Nat Language Proc Grp, D-04109 Leipzig, Germany
来源
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW 2012) | 2012年
关键词
Domain adaptation; Domain similarity; Domain complexity; Instance selection; Cross-domain sentiment analysis; Polarity classification;
D O I
10.1109/ICDMW.2012.46
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an approach to domain adaptation that selects instances from a source domain training set, which are most similar to a target domain. The factor by which the original source domain training set size is reduced is determined automatically by measuring domain similarity between source and target domain as well as their domain complexity variance. Domain similarity is measured as divergence between term unigram distributions. Domain complexity is measured as homogeneity, i.e. self-similarity. We evaluate our approach in a semi-supervised cross-domain document-level polarity classification experiment. Thereby we show, that it yields small but statistically significant improvements over several natural baselines and achieves results competitive to other state-of-the-art domain adaptation schemes.
引用
收藏
页码:717 / 723
页数:7
相关论文
共 50 条
  • [1] Andreevskaia A., 2008, Proceedings of ACL-2008: HLT, P290
  • [2] [Anonymous], 2010, P ACL
  • [3] [Anonymous], 2006, Proceedings of the Conference on Empirical Methods in Natural Language Processing
  • [4] [Anonymous], 2009, Proceedings of 3rd International ICWSM Conference on Weblogs and Social Media
  • [5] [Anonymous], 2007, Proceedings of Recent Advances in Natural Language Processing
  • [6] [Anonymous], P 5 INT C REC ADV NA
  • [7] [Anonymous], 2001, INT J CORPUS LINGUIS, DOI DOI 10.1075/IJCL.6.1.05KIL
  • [8] [Anonymous], 2008, P ACL 08 HLT ASS COM
  • [9] [Anonymous], P 2 WORKSH COMP AAPR
  • [10] [Anonymous], 2001, Artificial Intelligence and Statistics 2001