Deconstructing Generative Adversarial Networks

被引:23
作者
Zhu, Banghua [1 ]
Jiao, Jiantao [1 ]
Tse, David [2 ]
机构
[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
关键词
Gallium nitride; Generators; Perturbation methods; Generative adversarial networks; TV; Sociology; Statistics; Generative Adversarial Networks (GANs); wasserstein distance; optimal transport; generalization error; information-theoretic limit; robust statistics; WASSERSTEIN DISTANCE; CONVERGENCE; LAWS;
D O I
10.1109/TIT.2020.2983698
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Generative Adversarial Networks (GANs) are a thriving unsupervised machine learning technique that has led to significant advances in various fields such as computer vision, natural language processing, among others. However, GANs are known to be difficult to train and usually suffer from mode collapse and the discriminator winning problem. To interpret the empirical observations of GANs and design better ones, we deconstruct the study of GANs into three components and make the following contributions. _ Formulation: we propose a perturbation view of the population target of GANs. Building on this interpretation, we show that GANs can be connected to the robust statistics framework, and propose a novel GAN architecture, termed as Cascade GANs, to provably recover meaningful low-dimensional generator approximations when the real distribution is high-dimensional and corrupted by outliers. _ Generalization: given a population target of GANs, we design a systematic principle, projection under admissible distance, to design GANs to meet the population requirement using only finite samples. We implement our principle in three cases to achieve polynomial and sometimes near-optimal sample complexities: (1) learning an arbitrary generator under an arbitrary pseudonorm; (2) learning a Gaussian location family under total variation distance, where we utilize our principle to provide a new proof for the near-optimality of the Tukey median viewed as GANs; (3) learning a low-dimensional Gaussian approximation of a high-dimensional arbitrary distribution under Wasserstein distance. We demonstrate a fundamental trade-off in the approximation error and statistical error in GANs, and demonstrate how to apply our principle in practice with only empirical samples to predict how many samples would be sufficient for GANs in order not to suffer from the discriminator winning problem. _ Optimization: we demonstrate alternating gradient descent is provably not locally asymptotically stable in optimizing the GAN formulation of PCA. We found that the minimax duality gap being non-zero might be one of the causes, and propose a new GAN architecture whose duality gap is zero, where the value of the game is equal to the previous minimax value (not the maximin value). We prove the new GAN architecture is globally asymptotically stable in solving PCA under alternating gradient descent.
引用
收藏
页码:7155 / 7179
页数:25
相关论文
共 65 条
[1]  
Ambrosio L., 2018, ARXIV181007002
[2]  
Anderson T. W., 1978, OLKNSF130 STANF U
[3]  
[Anonymous], representation learning with deep convolutional generative
[4]  
[Anonymous], 2017, ARXIV171010793
[5]  
[Anonymous], 2017, ARXIV170609884
[6]  
[Anonymous], 2017, P INT C LEARN REPR
[7]  
[Anonymous], 2018, Density estimation for statistics and data analysis, DOI DOI 10.1201/9781315140919
[8]  
[Anonymous], 2016, ARXIV161204021
[9]  
[Anonymous], 2015, Measures of complexity: festschrift for alexey chervonenkis, DOI DOI 10.1007/978-3-319-21852-6_3
[10]  
[Anonymous], 2016, ARXIV161009585