Emergence of Invariance and Disentanglement in Deep Representations
被引:0
|
作者:
Achille, Alessandro
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USAUniv Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
Achille, Alessandro
[1
]
Soatto, Stefano
论文数: 0引用数: 0
h-index: 0
机构:
Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USAUniv Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
Soatto, Stefano
[1
]
机构:
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
来源:
2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA)
|
2018年
关键词:
Deep learning;
neural network;
representation;
flat minima;
information bottleneck;
overfitting;
generalization;
sufficiency;
minimality;
sensitivity;
information complexity;
stochastic gradient descent;
regularization;
total correlation;
PAC-Bayes;
D O I:
暂无
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
Using established principles from Information Theory and Statistics, we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion. This also has an alternative interpretation as minimizing a PAC-Bayesian bound on the test error. Finally, we exploit a duality between weights and activations induced by the architecture, to show that the information in the weights bounds the minimality and Total Correlation of the layers, therefore showing that regularizing the weights explicitly or implicitly, using SGD, not only helps avoid overfitting, but also fosters invariance and disentangling of the learned representation. The theory also enables predicting sharp phase transitions between underfitting and overfitting random labels at precise information values, and sheds light on the relation between the geometry of the loss function, in particular so-called "flat minima," and generalization.
机构:
Fujian Univ Technol, Sch Transportat, Fuzhou 350118, Fujian, Peoples R China
Fuzhou Univ, Coll Comp & Data Sci, Fuzhou 350108, Peoples R China
Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
Fujian Univ Technol, Intelligent Transportat Syst Res Ctr, Fuzhou 350118, Fujian, Peoples R ChinaFujian Univ Technol, Sch Transportat, Fuzhou 350118, Fujian, Peoples R China
Chen, Dewang
Lu, Yuqi
论文数: 0引用数: 0
h-index: 0
机构:
Fujian Univ Technol, Sch Comp Sci & Math, Fuzhou 350118, Fujian, Peoples R ChinaFujian Univ Technol, Sch Transportat, Fuzhou 350118, Fujian, Peoples R China
Lu, Yuqi
Hsu, Chih-Yu
论文数: 0引用数: 0
h-index: 0
机构:
Fujian Univ Technol, Sch Transportat, Fuzhou 350118, Fujian, Peoples R ChinaFujian Univ Technol, Sch Transportat, Fuzhou 350118, Fujian, Peoples R China
机构:
Korea Adv Inst Sci & Technol, Dept Bio & Brain Engn, Daejeon, South KoreaKorea Adv Inst Sci & Technol, Dept Bio & Brain Engn, Daejeon, South Korea
Cheon, Jeonghwan
Baek, Seungdae
论文数: 0引用数: 0
h-index: 0
机构:
Korea Adv Inst Sci & Technol, Dept Bio & Brain Engn, Daejeon, South KoreaKorea Adv Inst Sci & Technol, Dept Bio & Brain Engn, Daejeon, South Korea
Baek, Seungdae
Paik, Se-Bum
论文数: 0引用数: 0
h-index: 0
机构:
Korea Adv Inst Sci & Technol, Dept Bio & Brain Engn, Daejeon, South Korea
Korea Adv Inst Sci & Technol, Program Brain & Cognit Engn, Daejeon, South KoreaKorea Adv Inst Sci & Technol, Dept Bio & Brain Engn, Daejeon, South Korea