On the detrimental effect of invariances in the likelihood for variational inference

被引：0

作者：

Kurle, Richard ^{[1
]}

Herbrich, Ralf ^{[2
]}

Januschowski, Tim ^{[1
,3
]}

Wang, Yuyang ^{[1
]}

Gasthaus, Jan ^{[1
]}

机构：

[1] AWS AI Labs, Seattle, WA 98019 USA

[2] Hasso Plattner Inst, Potsdam, Germany

[3] Zalando SE, Berlin, Germany

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Variational Bayesian posterior inference often requires simplifying approximations such as mean-field parametrisation to ensure tractability. However, prior work has associated the variational mean-field approximation for Bayesian neural networks with underfitting in the case of small datasets or large model sizes. In this work, we show that invariances in the likelihood function of over-parametrised models contribute to this phenomenon because these invariances complicate the structure of the posterior by introducing discrete and/or continuous modes which cannot be well approximated by Gaussian mean-field distributions. In particular, we show that the mean-field approximation has an additional gap in the evidence lower bound compared to a purpose-built posterior that takes into account the known invariances. Importantly, this invariance gap is not constant; it vanishes as the approximation reverts to the prior. We proceed by first considering translation invariances in a linear model with a single data point in detail. We show that, while the true posterior can be constructed from a mean-field parametrisation, this is achieved only if the objective function takes into account the invariance gap. Then, we transfer our analysis of the linear model to neural networks. Our analysis provides a framework for future work to explore solutions to the invariance problem.

引用

页数：12

共 38 条

[1] Aitchison L., 2021, P MACHINE LEARNING R, V139, P130
[2] Blundell C, 2015, PR MACH LEARN RES, V37, P1613
[3] Brea J., 2019, ABS190702911 ARXIV
[4] Burt David R., 2021, 3 S ADV APPR BAYES I
[5] Coker B., 2022, P 25 INT C ART INT S
[6] DAWID AP, 1979, J ROY STAT SOC B MET, V41, P1
[7] Dusenberry MW, 2020, PR MACH LEARN RES, V119
[8] Farquhar Sebastian, 2020, ADV NEURAL INFORM PR, V33, P4346, DOI DOI 10.5555/3495724.3496089
[9] Flam-Shepherd D., 2017, BAYES DEEP LEARN NEU
[10] Foong Andrew, 2020, Advances in Neural Information Processing Systems, V33, P15897

← 1 2 3 4 →