Distributed Variational Representation Learning

被引:45
作者
Estella-Aguerri, Inaki [1 ]
Zaidi, Abdellatif [1 ,2 ]
机构
[1] Huawei Technol France, Math & Algorithm Sci Lab, Paris Res Ctr, F-92100 Boulogne, France
[2] Univ Paris Est, F-77454 Champs Sur Marne, France
关键词
Representation learning; distributed learning; information theory; neural networks; information bottleneck; INFORMATION; STABILITY; ENTROPY;
D O I
10.1109/TPAMI.2019.2928806
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The problem of distributed representation learning is one in which multiple sources of information X-1, ..., X-K are processed separately so as to learn as much information as possible about some ground truth Y. We investigate this problem from informationtheoretic grounds, through a generalization of Tishby's centralized Information Bottleneck (IB) method to the distributed setting. Specifically, K encoders, K >= 2, compress their observations X-1, ..., X-K separately in a manner such that, collectively, the produced representations preserve as much information as possible about Y. We study both discrete memoryless (DM) and memoryless vector Gaussian data models. For the discrete model, we establish a single-letter characterization of the optimal tradeoff between complexity (or rate) and relevance (or information) for a class of memoryless sources (the observations X-1, ..., X-K being conditionally independent given Y). For the vector Gaussian model, we provide an explicit characterization of the optimal complexity-relevance tradeoff. Furthermore, we develop a variational bound on the complexity-relevance tradeoff which generalizes the evidence lower bound (ELBO) to the distributed setting. We also provide two algorithms that allow to compute this bound: i) a Blahut-Arimoto type iterative algorithm which enables to compute optimal complexity-relevance encoding mappings by iterating over a set of self-consistent equations, and ii) a variational inference type algorithm in which the encoding mappings are parametrized by neural networks and the bound approximated by Markov sampling and optimized with stochastic gradient descent. Numerical results on synthetic and real datasets are provided to support the efficiency of the approaches and algorithms developed in this paper.
引用
收藏
页码:120 / 138
页数:19
相关论文
共 76 条
[1]  
Achille A, 2018, J MACH LEARN RES, V19
[2]   Information Dropout: Learning Optimal Representations Through Noisy Computation [J].
Achille, Alessandro ;
Soatto, Stefano .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) :2897-2905
[3]  
Aguerri I. E., 2018, P IEEE INT ZUR SEM C, P154
[4]   On the Capacity of Cloud Radio Access Networks With Oblivious Relaying [J].
Aguerri, Inaki Estella ;
Zaidi, Abdellatif ;
Caire, Giuseppe ;
Shamai , Shlomo .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (07) :4575-4596
[5]  
Alemi A. A., 2017, FIXING BROKEN ELBO
[6]  
Alemi A. A., 2018, THERML THERMODYNAMIC
[7]  
Alemi A.A., 2017, 5 INT C LEARNING REP
[8]  
[Anonymous], 2002, THESIS
[9]  
[Anonymous], 2011, INT C MACHINE LEARNI
[10]  
[Anonymous], 2014, C4 5 PROGRAMS MACHIN