Cross-Domain Learning from Multiple Sources: A Consensus Regularization Perspective

被引:48
作者
Zhuang, Fuzhen [1 ]
Luo, Ping
Xiong, Hui [2 ]
Xiong, Yuhong
He, Qing [1 ]
Shi, Zhongzhi [1 ]
机构
[1] Chinese Acad Sci, Key Lab Intelligent Informat Proc, Inst Comp Technol, Beijing 100190, Peoples R China
[2] Rutgers State Univ, Management Sci & Informat Syst Dept, Rutgers Business Sch Newark & New Brunswick, Newark, NJ 07102 USA
基金
美国国家科学基金会;
关键词
Classification; multiple source domains; cross-domain learning; consensus regularization;
D O I
10.1109/TKDE.2009.205
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification across different domains studies how to adapt a learning model from one domain to another domain which shares similar data characteristics. While there are a number of existing works along this line, many of them are only focused on learning from a single source domain to a target domain. In particular, a remaining challenge is how to apply the knowledge learned from multiple source domains to a target domain. Indeed, data from multiple source domains can be semantically related, but have different data distributions. It is not clear how to exploit the distribution differences among multiple source domains to boost the learning performance in a target domain. To that end, in this paper, we propose a consensus regularization framework for learning from multiple source domains to a target domain. In this framework, a local classifier is trained by considering both local data available in one source domain and the prediction consensus with the classifiers learned from other source domains. Moreover, we provide a theoretical analysis as well as an empirical study of the proposed consensus regularization framework. The experimental results on text categorization and image classification problems show the effectiveness of this consensus regularization learning method. Finally, to deal with the situation that the multiple source domains are geographically distributed, we also develop the distributed version of the proposed algorithm, which avoids the need to upload all the data to a centralized location and helps to mitigate privacy concerns.
引用
收藏
页码:1664 / 1678
页数:15
相关论文
共 27 条
[1]  
Abney S, 2004, COMPUT LINGUIST, V30, P364
[2]  
Abney S., 2002, P 40 ANN M ASS COMP
[3]  
[Anonymous], THESIS U ILLINOIS UR
[4]  
Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
[5]  
Dai W., 2007, P 24 INT C MACH LEAR, P193, DOI [10.1145/1273496.1273521, DOI 10.1145/1273496.1273521]
[6]  
Dai WY, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P210
[7]  
Dasgupta S, 2002, ADV NEUR IN, V14, P375
[8]   Ensemble methods in machine learning [J].
Dietterich, TG .
MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 :1-15
[9]  
Duan L., 2009, ICML, P289
[10]  
GAO J, 2009, P 15 ACM SIGKDD