A Generative Neighborhood-Based Deep Autoencoder for Robust Imbalanced Classification

被引:3
作者
Troullinou E. [1 ,2 ]
Tsagkatakis G. [1 ,2 ]
Losonczy A. [3 ,4 ]
Poirazi P. [5 ]
Tsakalides P. [1 ,2 ]
机构
[1] University of Crete, Department of Computer Science, Heraklion
[2] Institute of Computer Science - FORTH, Heraklion
[3] Columbia University, Department of Neuroscience, New York, 10027, NY
[4] Mortimer B. Zuckerman Mind Brain Behavior Institute, New York, 10027, NY
[5] Foundation for Research and Technology Hellas, Institute of Molecular Biology and Biotechnology, Heraklion
来源
IEEE Transactions on Artificial Intelligence | 2024年 / 5卷 / 01期
关键词
Data augmentation; image data; imbalanced classification; latent space; time-series data;
D O I
10.1109/TAI.2023.3249685
中图分类号
学科分类号
摘要
Deep learning models perform remarkably well on many classification tasks recently. The superior performance of deep neural networks relies on the large number of training data, which at the same time must have an equal class distribution in order to be efficient. However, in most real-world applications, the labeled data may be limited with high imbalance ratios among the classes, and thus, the learning process of most classification algorithms is adversely affected resulting in unstable predictions and low performance. Three main categories of approaches address the problem of imbalanced learning, i.e., data-level, algorithmic level, and hybrid methods, which combine the two aforementioned approaches. Data generative methods are typically based on generative adversarial networks, which require significant amounts of data, while model-level methods entail extensive domain expert knowledge to craft the learning objectives, thereby being less accessible for users without such knowledge. Moreover, the vast majority of these approaches are designed and applied to imaging applications, less to time series, and extremely rare to both of them. To address the above issues, we introduce GENDA, a generative neighborhood-based deep autoencoder, which is simple yet effective in its design and can be successfully applied to both image and time-series data. GENDA is based on learning latent representations that rely on the neighboring embedding space of the samples. Extensive experiments, conducted on a variety of widely-used real datasets demonstrate the efficacy of the proposed method. © 2020 IEEE.
引用
收藏
页码:80 / 91
页数:11
相关论文
共 38 条
  • [1] Somasundaram A., Reddy U.S., Data imbalance: Effects solutions for classification of large highly imbalanced data, Proc. Int. Conf. Res. Eng., Comput., Technol, pp. 1-16, (2016)
  • [2] Fernandez A., Garcia S., Galar M., Prati R.C., Krawczyk B., Herrera F., Learning From Imbalanced Data Sets, 10, (2018)
  • [3] Khushi M., Et al., A comparative performance analysis of data resampling methods on imbalance medical data, IEEE Access, 9, pp. 109960-109975, (2021)
  • [4] Chawla N.V., Bowyer K.W., Hall L.O., Kegelmeyer W.P., Smote: Synthetic minority over-sampling technique, J. Artif. Intell. Res, 16, pp. 321-357, (2002)
  • [5] Fernandez A., Garcia S., Herrera F., Chawla N.V., Smote for learning from imbalanced data: Progress challenges, marking the 15-year anniversary, J. Artif. Intell. Res, 61, pp. 863-905, (2018)
  • [6] Kingma D.P., Welling M., Auto-encoding variational Bayes, (2013)
  • [7] Goodfellow I., Et al., Generative adversarial nets, Proc. 27th Int. Conf. Neural Inf. Process. Syst, 27, pp. 2672-2680, (2014)
  • [8] Pan Z., Et al., Loss functions of generative adversarial networks (GANs): Opportunities challenges, IEEE Trans. Emerg. Topics Comput. Intell, 4, 4, pp. 500-522, (2020)
  • [9] Lin T.-Y., Goyal P., Girshick R., He K., Dollar P., Focal loss for dense object detection, Proc. IEEE Int. Conf. Comput. Vis, pp. 2980-2988, (2017)
  • [10] Li X., Sun X., Meng Y., Liang J., Wu F., Li J., Dice loss for data-imbalanced NLP tasks, Proc. 58th Annu. Meet. Assoc. Computat. Linguist, pp. 465-476, (2020)