Transfer learning for class imbalance problems with inadequate data

被引:0
作者
Samir Al-Stouhi
Chandan K. Reddy
机构
[1] Honda Automobile Technology Research,Department of Computer Science
[2] Wayne State University,undefined
来源
Knowledge and Information Systems | 2016年 / 48卷
关键词
Rare class; Transfer learning; Class imbalance; AdaBoost; Weighted majority algorithm; HealthCare informatics; Text mining;
D O I
暂无
中图分类号
学科分类号
摘要
A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.
引用
收藏
页码:201 / 228
页数:27
相关论文
共 49 条
[1]  
He H(2009)Learning from imbalanced data IEEE Trans Knowl Data Eng 21 1263-1284
[2]  
Garcia E(2010)A survey on transfer learning IEEE Trans Knowl Data Eng 22 1345-1359
[3]  
Pan SJ(2015)Constrained elastic net based knowledge transfer for healthcare information exchange Data Min Knowl Discov 29 1094-1112
[4]  
Yang Q(2004)Mining with rarity: a unifying framework SIGKDD Explor Newsl 6 7-19
[5]  
Li Y(2003)Learning when training data are costly: the effect of class distribution on tree induction J Artif Intell Res 19 315-354
[6]  
Vinzamuri B(2002)The class imbalance problem: a systematic study Intell Data Anal 6 429-449
[7]  
Reddy CK(1997)The use of the area under the roc curve in the evaluation of machine learning algorithms Pattern Recognit 30 1145-1159
[8]  
Weiss GM(2007)Cost-sensitive boosting for classification of imbalanced data Pattern Recognit 40 3358-3378
[9]  
Weiss GM(1998)Machine learning for the detection of oil spills in satellite radar images Mach Learn 30 195-215
[10]  
Provost F(2008)Design and analysis of the causation and prediction challenge J Mach Learn Res Proc Track 3 1-33