Transfer synthetic over-sampling for class-imbalance learning with limited minority class data

被引：14

作者：

Liu, Xu-Ying ^{[1
,2
,3
]}

Wang, Sheng-Tao ^{[1
,2
,3
]}

Zhang, Min-Ling ^{[1
,2
,3
]}

机构：

[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 210096, Jiangsu, Peoples R China

[2] Southeast Univ, Minist Educ, Key Lab Comp Network & Informat Integrat, Nanjing 210096, Jiangsu, Peoples R China

[3] Collaborat Innovat Ctr Wireless Commun Technol, Nanjing 210096, Jiangsu, Peoples R China

来源：

FRONTIERS OF COMPUTER SCIENCE | 2019年 / 13卷 / 05期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

machine learning; data mining; class imbalance; over sampling; boosting; transfer learning; DATA-SETS; ENSEMBLES;

D O I：

10.1007/s11704-018-7182-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The problem of limited minority class data is encountered in many class imbalanced applications, but has received little attention. Synthetic over-sampling, as popular class-imbalance learning methods, could introduce much noise when minority class has limited data since the synthetic samples are not i.i.d. samples of minority class. Most sophisticated synthetic sampling methods tackle this problem by denoising or generating samples more consistent with ground-truth data distribution. But their assumptions about true noise or ground-truth data distribution may not hold. To adapt synthetic sampling to the problem of limited minority class data, the proposed Traso framework treats synthetic minority class samples as an additional data source, and exploits transfer learning to transfer knowledge from them to minority class. As an implementation, TrasoBoost method firstly generates synthetic samples to balance class sizes. Then in each boosting iteration, the weights of synthetic samples and original data decrease and increase respectively when being misclassified, and remain unchanged otherwise. The misclassified synthetic samples are potential noise, and thus have smaller influence in the following iterations. Besides, the weights of minority class instances have greater change than those of majority class instances to be more influential. And only original data are used to estimate error rate to be immune from noise. Finally, since the synthetic samples are highly related to minority class, all of the weak learners are aggregated for prediction. Experimental results show TrasoBoost outperforms many popular class-imbalance learning methods.

引用

页码：996 / 1009

页数：14

共 50 条

[21] Class-Imbalance Adversarial Transfer Learning Network for Cross-Domain Fault Diagnosis with Imbalanced Data
Kuang, Jiachen
Xu, Guanghua
Tao, Tangfei
Wu, Qingqiang
IEEE Transactions on Instrumentation and Measurement, 2022, 71
[22] Class-Imbalance Adversarial Transfer Learning Network for Cross-Domain Fault Diagnosis With Imbalanced Data
Kuang, Jiachen
Xu, Guanghua
Tao, Tangfei
Wu, Qingqiang
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2022, 71
[23] Distributed Sparse Class-Imbalance Learning and Its Applications
Maurya, Chandresh Kumar
Toshniwal, Durga
Venkoparao, Gopalan Vijendran
IEEE TRANSACTIONS ON BIG DATA, 2021, 7 (05) : 832 - 844
[24] METAbolomics data Balancing with Over-sampling Algorithms (META-BOA): an online resource for addressing class imbalance
Hashimoto-Roth, Emily
Surendra, Anuradha
Lavallee-Adam, Mathieu
Bennett, Steffany A. L.
Cuperlovic-Culf, Miroslava
BIOINFORMATICS, 2022, 38 (23) : 5326 - 5327
[25] A Method for Class-Imbalance Learning in Android Malware Detection
Guan, Jun
Jiang, Xu
Mao, Baolei
ELECTRONICS, 2021, 10 (24)
[26] Safe Level Graph for Synthetic Minority Over-sampling Techniques
Bunkhumpornpat, Chumphol
Subpaiboonkit, Sitthichoke
2013 13TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT): COMMUNICATION AND INFORMATION TECHNOLOGY FOR NEW LIFE STYLE BEYOND THE CLOUD, 2013, : 570 - 575
[27] Towards Mitigating the Class-Imbalance Problem for Partial Label Learning
Wang, Jing
Zhang, Min-Ling
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, : 2427 - 2436
[28] Handling Autism Imbalanced Data using Synthetic Minority Over-Sampling Technique (SMOTE)
El-Sayed, Asmaa Ahmed
Meguid, Nagwa Abdel
Mahmood, Mahmood Abdel Manem
Hefny, Hesham Ahmed
PROCEEDINGS OF 2015 THIRD IEEE WORLD CONFERENCE ON COMPLEX SYSTEMS (WCCS), 2015,
[29] LVQ-SMOTE - Learning Vector Quantization based Synthetic Minority Over-sampling Technique for biomedical data
Nakamura, Munehiro
Kajiwara, Yusuke
Otsuka, Atsushi
Kimura, Haruhiko
BIODATA MINING, 2013, 6
[30] Large-Scale Distributed Sparse Class-Imbalance Learning
Maurya, Chandresh Kumar
Toshniwal, Durga
INFORMATION SCIENCES, 2018, 456 : 1 - 12

← 1 2 3 4 5 →