Mixed-type data generation method based on generative adversarial networks

被引:4
作者
Wei, Ning [1 ]
Wang, Longzhi [1 ]
Chen, Guanhua [1 ]
Wu, Yirong [1 ,2 ]
Sun, Shunfa [1 ]
Chen, Peng [1 ]
机构
[1] China Three Gorges Univ, Coll Comp & Informat Technol, Yichang 443002, Peoples R China
[2] Beijing Normal Univ, Ctr Governance Studies, Zhuhai 519087, Peoples R China
基金
中国国家自然科学基金;
关键词
Generative adversarial network; Autoencoder; Mixed type data;
D O I
10.1186/s13638-022-02105-7
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Data-driven based deep learing has become a key research direction in the field of artificial intelligence. Abundant training data is a guarantee for building efficient and accurate models. However, due to the privacy protection policy, research institutions are often limited to obtain a large number of training data, which would lead to a lack of training sets circumstance. In this paper, a mixed-type data generation model based on generative adversarial networks is proposed to synthesize fake data that have the same distribution with the real data, so as to supplement the real data and increase the number of available samples. The model first pre-trains the autoencoder which maps given dataset into a low-dimensional continuous space. Then, the generator constructed in the low-dimension space is obtained by training it adversarially with discriminator constructed in the original space. Since the constructed discriminator not only consider the loss of the continuous attributes but also the labeled attributes, the generator nets formed by the generator and the decoder can effectively learn the intrinsic distribution of the mixed data. We evaluate the proposed method both in the independent distribution of the attribute and in the relationship of the attributes, and the experiment results show that the proposed generate method has a better performance in preserve the intrinsic distribution compared with other generation algorithms based on deep learning.
引用
收藏
页数:11
相关论文
共 26 条
[1]  
[Anonymous], 2016, ARXIV160404960
[2]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[3]  
Bin NIU, 2019, COMPUT TECHNOL DEV, V29, P43
[4]   Pros and cons of GAN evaluation measures [J].
Borji, Ali .
COMPUTER VISION AND IMAGE UNDERSTANDING, 2019, 179 :41-65
[5]   Data-driven approach for creating synthetic electronic medical records [J].
Buczak, Anna L. ;
Babin, Steven ;
Moniz, Linda .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10
[6]  
Camino R.D., 2018, ARXIV PREPRINT ARXIV
[7]   生成式对抗网络研究与应用进展 [J].
柴梦婷 ;
朱远平 .
计算机工程, 2019, 45 (09) :222-234
[8]  
Chen XK, 2017, AER ADV ENG RES, V100, P1
[9]  
Choi E., 2017, PMLR, P286, DOI DOI 10.48550/ARXIV.1703.06490
[10]   Anonymising and sharing individual patient data [J].
El Emam, Khaled ;
Rodgers, Sam ;
Malin, Bradley .
BMJ-BRITISH MEDICAL JOURNAL, 2015, 350