Emergence of Invariance and Disentanglement in Deep Representations

被引：0

作者：

Achille, Alessandro ^{[1
]}

Soatto, Stefano ^{[1
]}

机构：

[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA

来源：

2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA) | 2018年

关键词：

Deep learning; neural network; representation; flat minima; information bottleneck; overfitting; generalization; sufficiency; minimality; sensitivity; information complexity; stochastic gradient descent; regularization; total correlation; PAC-Bayes;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Using established principles from Information Theory and Statistics, we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion. This also has an alternative interpretation as minimizing a PAC-Bayesian bound on the test error. Finally, we exploit a duality between weights and activations induced by the architecture, to show that the information in the weights bounds the minimality and Total Correlation of the layers, therefore showing that regularizing the weights explicitly or implicitly, using SGD, not only helps avoid overfitting, but also fosters invariance and disentangling of the learned representation. The theory also enables predicting sharp phase transitions between underfitting and overfitting random labels at precise information values, and sheds light on the relation between the geometry of the loss function, in particular so-called "flat minima," and generalization.

引用

页数：29

共 50 条

[21] Deep Learning to Hash with Multiple Representations
Kang, Yoonseop
Kim, Saehoon
Choi, Seungjin
12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 930 - 935
[22] Measurement Invariance Investigation for Performance of Deep Learning Architectures
Chen, Dewang
Lu, Yuqi
Hsu, Chih-Yu
IEEE ACCESS, 2022, 10 : 78070 - 78087
[23] Invariance of object detection in untrained deep neural networks
Cheon, Jeonghwan
Baek, Seungdae
Paik, Se-Bum
FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2022, 16
[24] Omics Data and Data Representations for Deep Learning-Based Predictive Modeling
Tsimenidis, Stefanos
Vrochidou, Eleni
Papakostas, George A.
INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (20)
[25] Deep Causal Disentanglement Network With Domain Generalization for Cross-Machine Bearing Fault Diagnosis
Guo, Chaochao
Sun, Youchao
Yu, Rourou
Ren, Xinxin
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
[26] Special Features of Deep Learning and Symbol Emergence
Yutaka Matsuo
New Generation Computing, 2020, 38 : 5 - 6
[27] Emergence of Network Motifs in Deep Neural Networks
Zambra, Matteo
Maritan, Amos
Testolin, Alberto
ENTROPY, 2020, 22 (02)
[28] Special Features of Deep Learning and Symbol Emergence
Matsuo, Yutaka
NEW GENERATION COMPUTING, 2020, 38 (01) : 5 - 6
[29] DeepNoise: Signal and Noise Disentanglement Based on Classifying Fluorescent Microscopy Images via Deep Learning
Yang, Sen
Shen, Tao
Fang, Yuqi
Wang, Xiyue
Zhang, Jun
Yang, Wei
Huang, Junzhou
Han, Xiao
GENOMICS PROTEOMICS & BIOINFORMATICS, 2022, 20 (05) : 989 - 1001
[30] Deep Image Representations for Coral Image Classification
Mahmood, Ammar
Bennamoun, Mohammed
An, Senjian
Sohel, Ferdous A.
Boussaid, Farid
Hovey, Renae
Kendrick, Gary A.
Fisher, Robert B.
IEEE JOURNAL OF OCEANIC ENGINEERING, 2019, 44 (01) : 121 - 131

← 1 2 3 4 5 →