Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

被引:14
|
作者
Liu, Xu-Ying [1 ,2 ,3 ]
Wang, Sheng-Tao [1 ,2 ,3 ]
Zhang, Min-Ling [1 ,2 ,3 ]
机构
[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China
[2] Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China
[3] Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
machine learning; data mining; class imbalance; over sampling; boosting; transfer learning; DATA-SETS; ENSEMBLES;
D O I
10.1007/s11704-018-7182-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.
引用
收藏
页码:996 / 1009
页数:14
相关论文
共 50 条
  • [21] Class-Imbalance Adversarial Transfer Learning Network for Cross-Domain Fault Diagnosis with Imbalanced Data
    Kuang, Jiachen
    Xu, Guanghua
    Tao, Tangfei
    Wu, Qingqiang
    IEEE Transactions on Instrumentation and Measurement, 2022, 71
  • [22] Class-Imbalance Adversarial Transfer Learning Network for Cross-Domain Fault Diagnosis With Imbalanced Data
    Kuang, Jiachen
    Xu, Guanghua
    Tao, Tangfei
    Wu, Qingqiang
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
  • [23] Distributed Sparse Class-Imbalance Learning and Its Applications
    Maurya, Chandresh Kumar
    Toshniwal, Durga
    Venkoparao, Gopalan Vijendran
    IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (05) : 832 - 844
  • [24] METAbolomics data Balancing with Over-sampling Algorithms (META-BOA): an online resource for addressing class imbalance
    Hashimoto-Roth, Emily
    Surendra, Anuradha
    Lavallee-Adam, Mathieu
    Bennett, Steffany A. L.
    Cuperlovic-Culf, Miroslava
    BIOINFORMATICS, 2022, 38 (23) : 5326 - 5327
  • [25] A Method for Class-Imbalance Learning in Android Malware Detection
    Guan, Jun
    Jiang, Xu
    Mao, Baolei
    ELECTRONICS, 2021, 10 (24)
  • [26] Safe Level Graph for Synthetic Minority Over-sampling Techniques
    Bunkhumpornpat, Chumphol
    Subpaiboonkit, Sitthichoke
    2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 570 - 575
  • [27] Towards Mitigating the Class-Imbalance Problem for Partial Label Learning
    Wang, Jing
    Zhang, Min-Ling
    KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2427 - 2436
  • [28] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
    El-Sayed, Asmaa Ahmed
    Meguid, Nagwa Abdel
    Mahmood, Mahmood Abdel Manem
    Hefny, Hesham Ahmed
    PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
  • [29] LVQ-SMOTE - Learning Vector Quantization based Synthetic Minority Over-sampling Technique for biomedical data
    Nakamura, Munehiro
    Kajiwara, Yusuke
    Otsuka, Atsushi
    Kimura, Haruhiko
    BIODATA MINING, 2013, 6
  • [30] Large-Scale Distributed Sparse Class-Imbalance Learning
    Maurya, Chandresh Kumar
    Toshniwal, Durga
    INFORMATION SCIENCES, 2018, 456 : 1 - 12