Data augmentation using MG-GAN for improved cancer classification on gene expression data

被引:0
作者
Poonam Chaudhari
Himanshu Agrawal
Ketan Kotecha
机构
[1] Gokhale Education Society’s R. H. Sapat College of Engineering,
[2] Management Studies and Research,undefined
[3] Symbiosis Institute of Technology,undefined
来源
Soft Computing | 2020年 / 24卷
关键词
Data augmentation; Generative adversarial network; Gene expression dataset; Cancer detection; Modified generator GAN; Multivariate noise; Gaussian distribution; Latent space; Saddle point;
D O I
暂无
中图分类号
学科分类号
摘要
Molecular biology studies on cancer, using gene expression datasets, have revealed that the datasets have a very small number of samples. Obtaining medical data is difficult and expensive due to privacy constraints. Accuracy of classifiers depends greatly on the quality and quantity of input data. The problem of small sample size or small data size has been addressed by augmentation. Owing to the sensitivity of synthetic data samples for the cancer data classification for gene expression data, this paper is motivated to investigate data augmentation using GAN. GAN is based on the principle of two blocks (generator and discriminator) working in a collaborative yet adversarial way. This paper proposes modified generator GAN (MG-GAN) where the generator is fed with original data and multivariate noise to generate data with Gaussian distribution. As the generated data lie within latent space, we reach saddle point faster. GAN has been widely used in data augmentation for image datasets. As per our understanding, this is the first attempt of using GAN for augmentation on gene expression dataset. The performance merit of proposed MG-GAN was compared with KNN and Basic GAN. As compared to KNN and GAN, MG-GAN improves classification accuracy by 18.8% and 11.9%, respectively. The loss value of the error function for MG-GAN is drastically reduced, from 0.6978 to 0.0082, ensuring sensitivity of the generated data. Improved classification accuracy and reduction in the loss value make our improved MG-GAN method better suited for critical applications with sensitive data.
引用
收藏
页码:11381 / 11391
页数:10
相关论文
共 90 条
  • [1] Chaudhari P(2019)Data augmentation for cancer classification in oncogenomics: an improved KNN based approach Evol Intell 31 117-119
  • [2] Agarwal H(2002)Oncogenomics: cancer and technology Nat Genet 30 1967-1974
  • [3] Collins F(2018)Inverting the generator of a generative adversarial network IEEE Trans Neural Netw Learn Syst 13 1-5
  • [4] Creswell A(2017)Generative adversarial networks (GAN) review CVR J Sci Technol 14 2310-2314
  • [5] Bharath AA(2017)Generative adversarial networks for change detection in multispectral imagery IEEE Geosci Remote Sens Lett 3 2672-2680
  • [6] Dutt RK(2014)Generative adversarial networks Adv Neural Inf Process Syst 44 9301-9319
  • [7] Premchand P(2019)Bayesian versus convolutional networks for arabic handwriting recognition Arab J Sci Eng 37 673-679
  • [8] Gong M(2018)Generative Adversarial Networks as an Advanced Data Augmentation Technique for MRI Data IEEE Trans Med Imaging 6 11342-11348
  • [9] Niu X(2018)Improved boundary equilibrium generative adversarial networks IEEE Access 66 8772-8781
  • [10] Zhang P(2019)A novel generative model with bounded-gan for reliability classification of gear safety IEEE Trans Industr Electron 30 2707-2719