Clustering of mixed datasets using deep learning algorithm

被引:6
作者
Balaji, K. [1 ]
Lavanya, K. [1 ]
Mary, A. Geetha [1 ]
机构
[1] VIT Univ, Sch Comp Sci & Engn, Vellore, Tamil Nadu, India
关键词
Deep learning; Mixed data; Generative adversarial networks; Clustering loss; INFORMATION;
D O I
10.1016/j.chemolab.2020.104123
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The performance of a clustering algorithm is highly dependent on the quality and quantity of the training dataset. Deep learning is one of the most popular and successful technique for clustering of datasets with high quality. Typically, most of the datasets contain mixed numeric and categorical data attributes. The clustering of such different types of data is a complex issue. Deep learning methods, the state-of-the-art classifiers, with better learning procedures and computational resources, can fill these gaps. To improve the robustness of clusters, we propose a Constraint-Based Deep Convolutional Generative Adversarial Network (CB-DCGANs) framework for generating simulated data to augment the training set to improve the performance of the clustering algorithm. We evaluated the performance of an end-to-end Deep Convolutional Neural Network (DCNN) in detecting the clusters from given datasets. The results from CB-DCGANs with DCNN yielded baseline accuracies of 0.8853 for heart disease dataset. In chemoinformatics datasets proposed algorithm yielded accuracies of 0.965 for kaggle dataset, 0.987 for factors dataset, 0.952 for kinase dataset. This study shows that using generative adversarial networks for clustering augmentation can significantly improve performance, especially in real-life applications.
引用
收藏
页数:11
相关论文
共 48 条
  • [1] Aljalbout E., 2018, ARXIV180107648
  • [2] [Anonymous], 2014, ARXIV14126296
  • [3] [Anonymous], 2016, ARXIV161004794
  • [4] [Anonymous], P 3 INT C LEARN REPR
  • [5] [Anonymous], 2017, ARXIV170208720
  • [6] [Anonymous], 2017, ADV NEURAL INFORM PR
  • [7] [Anonymous], 2016, P 30 INT C NEUR INF
  • [8] [Anonymous], 2015, ARXIV PREPRINT ARXIV
  • [9] Antreas A., 2017, ARXIV171104340
  • [10] Arjovsky M., 2017, ABS170107875 CORR