Transfer learning for class imbalance problems with inadequate data

被引:0
作者
Samir Al-Stouhi
Chandan K. Reddy
机构
[1] Honda Automobile Technology Research,Department of Computer Science
[2] Wayne State University,undefined
来源
Knowledge and Information Systems | 2016年 / 48卷
关键词
Rare class; Transfer learning; Class imbalance; AdaBoost; Weighted majority algorithm; HealthCare informatics; Text mining;
D O I
暂无
中图分类号
学科分类号
摘要
A fundamental problem in data mining is to effectively build robust classifiers in the presence of skewed data distributions. Class imbalance classifiers are trained specifically for skewed distribution datasets. Existing methods assume an ample supply of training examples as a fundamental prerequisite for constructing an effective classifier. However, when sufficient data are not readily available, the development of a representative classification algorithm becomes even more difficult due to the unequal distribution between classes. We provide a unified framework that will potentially take advantage of auxiliary data using a transfer learning mechanism and simultaneously build a robust classifier to tackle this imbalance issue in the presence of few training samples in a particular target domain of interest. Transfer learning methods use auxiliary data to augment learning when training examples are not sufficient and in this paper we will develop a method that is optimized to simultaneously augment the training data and induce balance into skewed datasets. We propose a novel boosting-based instance transfer classifier with a label-dependent update mechanism that simultaneously compensates for class imbalance and incorporates samples from an auxiliary domain to improve classification. We provide theoretical and empirical validation of our method and apply to healthcare and text classification applications.
引用
收藏
页码:201 / 228
页数:27
相关论文
共 49 条
[11]  
Japkowicz N(1994)How economic development and family planning programs combined to reduce indonesian fertility Demography 31 33-63
[12]  
Stephen S(2009)Suitability of dysphonia measurements for telemonitoring of parkinson’s disease IEEE Trans Biomed Eng 56 1015-1022
[13]  
Bradley AP(1987)Unified parkinson’s disease rating scale Recent Dev Parkinson’s Dis 2 153-163
[14]  
Sun Y(2003)An information-theoretic perspective of tf–idf measures Inf Process Manag 39 45-65
[15]  
Kamel MS(2002)Smote: synthetic minority over-sampling technique J Artif Intell Res 16 321-357
[16]  
Wong AK(1999)An evaluation of statistical approaches to text categorization Inf Retr 1 69-90
[17]  
Wang Y(2004)A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explor Newsl 6 20-29
[18]  
Kubat M(2012)Scalable and parallel boosting with mapreduce IEEE Trans Knowl Data Eng 24 1904-1916
[19]  
Holte RC(2011)Multi-resolution boosting for classification and regression problems Knowl Inf Syst 29 435-456
[20]  
Matwin S(undefined)undefined undefined undefined undefined-undefined