Enhancing Variational Generation Through Self-Decomposition

被引：2

作者：

Asperti, Andrea ^{[1
]}

Bugo, Laura ^{[1
]}

Filippini, Daniele ^{[1
]}

机构：

[1] Univ Bologna, Dept Informat Sci & Engn DISI, I-40126 Bologna, Italy

来源：

IEEE ACCESS | 2022年 / 10卷

关键词：

Deep learning; generative modeling; multi-layer neural networks; representation learning; unsupervised learning; variational autoencoder;

D O I：

10.1109/ACCESS.2022.3185654

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this article we introduce the notion of Split Variational Autoencoder (SVAE), whose output (x) over cap is obtained as a weighted sum sigma circle dot (x) over cap (1) + (1 - sigma) circle dot (x) over cap (2) of two generated images (x) over cap (1); (x) over cap (2), and sigma is a learned compositional map. The composing images (x) over cap (1); (x) over cap (2), as well as the sigma-map are automatically synthesized by the model. The network is trained as a usual Variational Autoencoder with a negative loglikelihood loss between training and reconstructed images. No additional loss is required for (x) over cap (1); (x) over cap (2) or sigma, neither any form of human tuning. The decomposition is nondeterministic, but follows two main schemes, that we may roughly categorize as either "syntactic'' or "semantic.'' In the first case, the map tends to exploit the strong correlation between adjacent pixels, splitting the image in two complementary high frequency sub-images. In the second case, the map typically focuses on the contours of objects, splitting the image in interesting variations of its content, with more marked and distinctive features. In this case, according to empirical observations, the Frechet Inception Distance (FID) of (x) over cap (1) and (x) over cap (2) is usually lower (hence better) than that of Ox, that clearly suffers from being the average of the former. In a sense, a SVAE forces the Variational Autoencoder to make choices, in contrast with its intrinsic tendency to average between alternatives with the aim to minimize the reconstruction loss towards a specific sample. According to the FID metric, our technique, tested on typical datasets such as Mnist, Cifar10 and CelebA, allows us to outperform all previous purely variational architectures (not relying on normalization flows).

引用

页码：67510 / 67520

页数：11

共 37 条

[1]

Asperti A., 2020, MACHINE LEARN ING OP

[2]

Asperti A., 2019, PROC 1 INT C ADV SIG, P1

[3]

Asperti A., 2021, SOCIAL NETW COMPUT S, V2, P301

[4]

Asperti A., 2022, MACHINE LEARNING OPT, P86, DOI DOI 10.1007/978-3-030-95470-3_7

[5] Balancing Reconstruction Error and Kullback-Leibler Divergence in Variational Autoencoders [J].

Asperti, Andrea ;

Trentin, Matteo .

IEEE ACCESS, 2020, 8 :199440-199448

[6]

Burda Y., 2015, Importance weighted autoencoders

[7]

Burgess C. P., 2018, CoRR abs/1804.03599

[8]

Dai B., 2019, PROC 7 INT C LEARN R, P1

[9]

Doersch C., 2016, arXiv

[10]

Garnett R., 2018, PROC ADV NEURAL INF, V31, P10236

← 1 2 3 4 →