Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

被引:14
|
作者
Liu, Xu-Ying [1 ,2 ,3 ]
Wang, Sheng-Tao [1 ,2 ,3 ]
Zhang, Min-Ling [1 ,2 ,3 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China
[3] Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
machine learning; data mining; class imbalance; over sampling; boosting; transfer learning; DATA-SETS; ENSEMBLES;
D O I
10.1007/s11704-018-7182-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.
引用
收藏
页码:996 / 1009
页数:14
相关论文
共 50 条
  • [1] Transfer synthetic over-sampling for class-imbalance learning with limited minority class data
    Xu-Ying Liu
    Sheng-Tao Wang
    Min-Ling Zhang
    Frontiers of Computer Science, 2019, 13 : 996 - 1009
  • [2] On the Use of Surrounding Neighbors for Synthetic Over-Sampling of the Minority Class
    Garcia, V.
    Sanchez, J. S.
    Mollineda, R. A.
    SMO 08: PROCEEDINGS OF THE 8TH WSEAS INTERNATIONAL CONFERENCE ON SIMULATION, MODELLING AND OPTIMIZATION, 2008, : 389 - +
  • [3] RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem
    Soltanzadeh, Paria
    Hashemzadeh, Mahdi
    INFORMATION SCIENCES, 2021, 542 : 92 - 111
  • [4] Adaptive Sampling with Optimal Cost for Class-Imbalance Learning
    Peng, Yuxin
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 2921 - 2927
  • [5] DOS-GAN: A Distributed Over-Sampling Method Based on Generative Adversarial Networks for Distributed Class-Imbalance Learning
    Guan, Hongtao
    Ma, Xingkong
    Shen, Siqi
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2020, PT III, 2020, 12454 : 609 - 622
  • [6] Exploratory under-sampling for class-imbalance learning
    Liu, Xu-Ying
    Wu, Jianxin
    Zhou, Zhi-Hua
    ICDM 2006: SIXTH INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2006, : 965 - 969
  • [7] Manifold Distance-Based Over-Sampling Technique for Class Imbalance Learning
    Yang, Lingkai
    Guo, Yinan
    Cheng, Jian
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 10071 - 10072
  • [8] Noise Reduction A Priori Synthetic Over-Sampling for class imbalanced data sets
    Rivera, William A.
    INFORMATION SCIENCES, 2017, 408 : 146 - 161
  • [9] Exploratory Undersampling for Class-Imbalance Learning
    Liu, Xu-Ying
    Wu, Jianxin
    Zhou, Zhi-Hua
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (02): : 539 - 550
  • [10] Trainable Undersampling for Class-Imbalance Learning
    Peng, Minlong
    Zhang, Qi
    Xing, Xiaoyu
    Gui, Tao
    Huang, Xuanjing
    Jiang, Yu-Gang
    Ding, Keyu
    Chen, Zhigang
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 4707 - 4714