A Partially Supervised Cross-Collection Topic Model for Cross-Domain Text Classification

被引:27
作者
Bao, Yang [1 ]
Collier, Nigel [2 ]
Datta, Anindya [1 ]
机构
[1] Natl Univ Singapore, Sch Comp, Singapore 117417, Singapore
[2] Natl Inst Informat, Tokyo 1018430, Japan
来源
PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13) | 2013年
关键词
Topic Modeling; LDA; Cross-Domain Learning; Text Classification;
D O I
10.1145/2505515.2505556
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cross-domain text classification aims to automatically train a precise text classifier for a target domain by using labelled text data from a related source domain. To this end, one of the most promising ideas is to induce a new feature representation so that the distributional difference between domains can be reduced and a more accurate classifier can be learned in this new feature space. However, most existing methods do not explore the duality of the marginal distribution of examples and the conditional distribution of class labels given labeled training examples in the source domain. Besides, few previous works attempt to explicitly distinguish the domain-independent and domain-specific latent features and align the domain-specific features to further improve the cross-domain learning. In this paper, we propose a model called Partially Supervised Cross-Collection LDA topic model (PSCCLDA) for cross-domain learning with the purpose of addressing these two issues in a unified way. Experimental results on nine datasets show that our model outperforms two standard classifiers and four state-of-the-art methods, which demonstrates the effectiveness of our proposed model.
引用
收藏
页码:239 / 247
页数:9
相关论文
共 21 条
[1]  
[Anonymous], 2011, Proceedings of the 17th ACM SIGKDD Interna- tional Conference on Knowledge Discovery and Data Mining
[2]  
Ben-David Shai, 2007, NIPS
[3]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[4]  
Blitzer J., 2006, P C EMPIRICAL METHOD, P120, DOI DOI 10.3115/1610075.1610094
[5]  
Blitzer J., 2007, P 45 ANN M ASS COMP, P440
[6]  
Doyle G., 2009, P 26 ANN INT C MACHI, P281
[7]  
Fan RE, 2008, J MACH LEARN RES, V9, P1871
[8]   Finding scientific topics [J].
Griffiths, TL ;
Steyvers, M .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2004, 101 :5228-5235
[9]   Unsupervised learning by probabilistic latent semantic analysis [J].
Hofmann, T .
MACHINE LEARNING, 2001, 42 (1-2) :177-196
[10]  
Jing Jiang., 2008, A literature survey on domain adaptation of statistical classifiers