Triplex Transfer Learning: Exploiting Both Shared and Distinct Concepts for Text Classification

被引:52
作者
Zhuang, Fuzhen [1 ]
Luo, Ping [2 ]
Du, Changying [1 ]
He, Qing [1 ]
Shi, Zhongzhi [1 ]
Xiong, Hui [3 ]
机构
[1] Chinese Acad Sci, Inst Comp Technol, Key Lab Intelligent Informat Proc, Beijing 100864, Peoples R China
[2] Hewlett Packard Labs, Beijing 100084, Peoples R China
[3] Rutgers State Univ, Rutgers Business Sch, Management Sci & Informat Syst Dept, Newark, NJ 08901 USA
基金
中国国家自然科学基金;
关键词
Common concept; distinct concept; distribution mismatch; nonnegative matrix trifactorization; triplex transfer learning;
D O I
10.1109/TCYB.2013.2281451
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Transfer learning focuses on the learning scenarios when the test data from target domains and the training data from source domains are drawn from similar but different data distributions with respect to the raw features. Along this line, some recent studies revealed that the high-level concepts, such as word clusters, could help model the differences of data distributions, and thus are more appropriate for classification. In other words, these methods assume that all the data domains have the same set of shared concepts, which are used as the bridge for knowledge transfer. However, in addition to these shared concepts, each domain may have its own distinct concepts. In light of this, we systemically analyze the high-level concepts, and propose a general transfer learning framework based on nonnegative matrix trifactorization, which allows to explore both shared and distinct concepts among all the domains simultaneously. Since this model provides more flexibility in fitting the data, it can lead to better classification accuracy. Moreover, we propose to regularize the manifold structure in the target domains to improve the prediction performances. To solve the proposed optimization problem, we also develop an iterative algorithm and theoretically analyze its convergence properties. Finally, extensive experiments show that the proposed model can outperform the baseline methods with a significant margin. In particular, we show that our method works much better for the more challenging tasks when there are distinct concepts in the data.
引用
收藏
页码:1191 / 1203
页数:13
相关论文
共 37 条
[1]  
[Anonymous], 2008, Advances in neural information processing systems, DOI DOI 10.5555/2981780.2981825
[2]  
[Anonymous], 2011, INT JOINT C ART INT
[3]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[4]  
Dai W., 2007, P 24 INT C MACH LEAR, P193, DOI [10.1145/1273496.1273521, DOI 10.1145/1273496.1273521]
[5]  
Dai W., 2009, P 26 ANN INT C MACH, P193, DOI [DOI 10.1145/1553374.1553399, 10.1145/1553374.1553399]
[6]  
Dai WY, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P210
[7]  
Ding C, 2006, P 12 ACM SIGKDD INT, P126, DOI [DOI 10.1145/1150402.1150420, 10.1145/1150402.1150420]
[8]   Multi-domain learning by confidence-weighted parameter combination [J].
Dredze, Mark ;
Kulesza, Alex ;
Crammer, Koby .
MACHINE LEARNING, 2010, 79 (1-2) :123-149
[9]  
Gao J., 2008, P 14 ACM SIGKDD INT, P283, DOI [DOI 10.1145/1401890.1401928, 10.1145/1401890.1401928]
[10]  
Gao J, 2009, KDD-09: 15TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P339