Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

被引:14
|
作者
Liu, Xu-Ying [1 ,2 ,3 ]
Wang, Sheng-Tao [1 ,2 ,3 ]
Zhang, Min-Ling [1 ,2 ,3 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China
[3] Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
machine learning; data mining; class imbalance; over sampling; boosting; transfer learning; DATA-SETS; ENSEMBLES;
D O I
10.1007/s11704-018-7182-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.
引用
收藏
页码:996 / 1009
页数:14
相关论文
共 50 条
  • [31] Graph-Based Class-Imbalance Learning With Label Enhancement
    Du, Guodong
    Zhang, Jia
    Jiang, Min
    Long, Jinyi
    Lin, Yaojin
    Li, Shaozi
    Tan, Kay Chen
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 6081 - 6095
  • [32] A systematic review for class-imbalance in semi-supervised learning
    Willian Dihanster Gomes de Oliveira
    Lilian Berton
    Artificial Intelligence Review, 2023, 56 : 2349 - 2382
  • [33] Learning from class-imbalance and heterogeneous data for 30-day hospital readmission
    Du, Guodong
    Zhang, Jia
    Li, Shaozi
    Li, Candong
    NEUROCOMPUTING, 2021, 420 : 27 - 35
  • [34] Safe-Level-SMOTE: Safe-Level-Synthetic Minority Over-Sampling TEchnique for Handling the Class Imbalanced Problem
    Bunkhumpornpat, Chumphol
    Sinapiromsaran, Krung
    Lursinsap, Chidchanok
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2009, 5476 : 475 - 482
  • [35] Ensemble of Cost-Sensitive Hypernetworks for Class-Imbalance Learning
    Wang, Jin
    Huang, Ping-li
    Sun, Kai-wei
    Cao, Bao-lin
    Zhao, Rui
    2013 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC 2013), 2013, : 1883 - 1888
  • [36] Towards Class-Imbalance Aware Multi-Label Learning
    Zhang, Min-Ling
    Li, Yu-Kun
    Liu, Xu-Ying
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 4041 - 4047
  • [37] Transfer learning for class imbalance problems with inadequate data
    Al-Stouhi, Samir
    Reddy, Chandan K.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2016, 48 (01) : 201 - 228
  • [38] Transfer learning for class imbalance problems with inadequate data
    Samir Al-Stouhi
    Chandan K. Reddy
    Knowledge and Information Systems, 2016, 48 : 201 - 228
  • [39] An Ensemble Learning Approach with Gradient Resampling for Class-Imbalance Problems
    Zhao, Hongke
    Zhao, Chuang
    Zhang, Xi
    Liu, Nanlin
    Zhu, Hengshu
    Liu, Qi
    Xiong, Hui
    INFORMS JOURNAL ON COMPUTING, 2023, 35 (04) : 747 - 763
  • [40] Towards Class-Imbalance Aware Multi-Label Learning
    Zhang, Min-Ling
    Li, Yu-Kun
    Yang, Hao
    Liu, Xu-Ying
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (06) : 4459 - 4471