Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

被引:14
作者
Liu, Xu-Ying [1 ,2 ,3 ]
Wang, Sheng-Tao [1 ,2 ,3 ]
Zhang, Min-Ling [1 ,2 ,3 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China
[3] Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
machine learning; data mining; class imbalance; over sampling; boosting; transfer learning; DATA-SETS; ENSEMBLES;
D O I
10.1007/s11704-018-7182-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.
引用
收藏
页码:996 / 1009
页数:14
相关论文
共 50 条
  • [41] An efficient algorithm coupled with synthetic minority over-sampling technique to classify imbalanced PubChem BioAssay data
    Hao, Ming
    Wang, Yanli
    Bryant, Stephen H.
    [J]. ANALYTICA CHIMICA ACTA, 2014, 806 : 117 - 127
  • [42] CORE: core-based synthetic minority over-sampling and borderline majority under-sampling technique
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    [J]. INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2015, 12 (01) : 44 - 58
  • [43] Synthetic Minority Over-Sampling Technique based on Fuzzy C-means Clustering for Imbalanced Data
    Lee, Hansoo
    Jung, Seunghyan
    Kim, Minseok
    Kimt, Sungshin
    [J]. 2017 INTERNATIONAL CONFERENCE ON FUZZY THEORY AND ITS APPLICATIONS (IFUZZY), 2017,
  • [44] A Comprehensive Analysis of Synthetic Minority Oversampling Technique (SMOTE) for handling class imbalance
    Elreedy, Dina
    Atiya, Amir F.
    [J]. INFORMATION SCIENCES, 2019, 505 : 32 - 64
  • [45] Handling Class-Imbalance with KNN (Neighbourhood) Under-Sampling for Software Defect Prediction
    Somya Goyal
    [J]. Artificial Intelligence Review, 2022, 55 : 2023 - 2064
  • [46] Ensemble learning via constraint projection and undersampling technique for class-imbalance problem
    Huaping Guo
    Jun Zhou
    Chang-an Wu
    [J]. Soft Computing, 2020, 24 : 4711 - 4727
  • [47] Ensemble learning via constraint projection and undersampling technique for class-imbalance problem
    Guo, Huaping
    Zhou, Jun
    Wu, Chang-An
    [J]. SOFT COMPUTING, 2020, 24 (07) : 4711 - 4727
  • [48] An Ensemble Learning-Based Undersampling Technique for Handling Class-Imbalance Problem
    Sarkar, Sobhan
    Khatedi, Nikhil
    Pramanik, Anima
    Maiti, J.
    [J]. PROCEEDINGS OF ICETIT 2019: EMERGING TRENDS IN INFORMATION TECHNOLOGY, 2020, 605 : 586 - 595
  • [49] Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning
    Sikora, Riyaz
    Lee, Yoon Sang
    [J]. INFORMATION SYSTEMS FRONTIERS, 2024,
  • [50] BCGAN: A CGAN-based over-sampling model using the boundary class for data balancing
    Minjae Son
    Seungwon Jung
    Seungmin Jung
    Eenjun Hwang
    [J]. The Journal of Supercomputing, 2021, 77 : 10463 - 10487