Unbalanced regression sample generation algorithm based on confrontation

被引:9
作者
Tian, Huixin [1 ,2 ]
Tian, Chunzhi [1 ,2 ]
Li, Kun [3 ]
Jia, Weinan [4 ]
机构
[1] Tiangong Univ, Sch Control Sci & Engn, Tianjin 300387, Peoples R China
[2] Tiangong Univ, Tianjin Key Lab Intelligent Control Elect Equipmen, Tianjin 300387, Peoples R China
[3] Tiangong Univ, Sch Econ & Management, Tianjin 300387, Peoples R China
[4] Wenzhou Business Coll, Sch Informat Engn, Wenzhou 325035, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Imbalanced regression; DIRVAE; RLSTM; RVAE-GAN; AUTO-ENCODER; NETWORK;
D O I
10.1016/j.ins.2023.119157
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data imbalance is an issue because the number of samples in different categories or target value ranges varies significantly. Numerous studies have been developed to address the data imbalance problem in classification samples. However, the issue of data imbalance in regression samples has not been researched well. The distribution of the target value of regression samples with the unbalanced data problem is more complicated than classification samples with the unbalanced data problem due to the continuity of the target values of regression samples. To solve this problem, we defined three basic modes of the data imbalance problem of regression samples: PSIR-mode (Positive Skewed Imbalanced Regression-mode), UNIR-mode (Un-Normal Imbalanced Regression-mode) and NSIR-mode (Negative Skewed Imbalanced Regression-mode). Any regression samples having data imbalance problems with complex target value distributions can be split into these three modes. To solve the data imbalance problem in regression samples, we proposed the DIRVAE (Deep Imbalanced Regression Variational Autoencoder) algorithm to generate missing and minority samples. The model can learn the distribution information of the original sample and the sample information between adjacent samples in the target value distribution. Experiments in biology, medicine and aerospace have proved the superiority of the model.
引用
收藏
页数:18
相关论文
共 31 条
[1]  
Bao J., 2017, CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training
[2]  
Branco P., 2017, INT WORKSHOP LEARNIN
[3]  
Branco P., 2018, 2 INT WORKSHOP LEARN, P67
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   Effective data generation for imbalanced learning using conditional generative adversarial networks [J].
Douzas, Georgios ;
Bacao, Fernando .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 :464-471
[6]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[7]  
Hu F., 2019, in 2019 IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA)
[8]   Splicing learning: A novel few-shot learning approach [J].
Hu, Lianting ;
Liang, Huiying ;
Lu, Long .
INFORMATION SCIENCES, 2021, 552 :17-28
[9]   Variational autoencoder based bipartite network embedding by integrating local and global structure [J].
Jiao, Pengfei ;
Tang, Minghu ;
Liu, Hongtao ;
Wang, Yaping ;
Lu, Chunyu ;
Wu, Huaming .
INFORMATION SCIENCES, 2020, 519 :9-21
[10]  
Kingma DP, 2014, ADV NEUR IN, V27