Emergence of Invariance and Disentanglement in Deep Representations

被引:0
|
作者
Achille, Alessandro [1 ]
Soatto, Stefano [1 ]
机构
[1] Univ Calif Los Angeles, Dept Comp Sci, Los Angeles, CA 90095 USA
来源
2018 INFORMATION THEORY AND APPLICATIONS WORKSHOP (ITA) | 2018年
关键词
Deep learning; neural network; representation; flat minima; information bottleneck; overfitting; generalization; sufficiency; minimality; sensitivity; information complexity; stochastic gradient descent; regularization; total correlation; PAC-Bayes;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Using established principles from Information Theory and Statistics, we show that in a deep neural network invariance to nuisance factors is equivalent to information minimality of the learned representation, and that stacking layers and injecting noise during training naturally bias the network towards learning invariant representations. We then show that, in order to avoid memorization, we need to limit the quantity of information stored in the weights, which leads to a novel usage of the Information Bottleneck Lagrangian on the weights as a learning criterion. This also has an alternative interpretation as minimizing a PAC-Bayesian bound on the test error. Finally, we exploit a duality between weights and activations induced by the architecture, to show that the information in the weights bounds the minimality and Total Correlation of the layers, therefore showing that regularizing the weights explicitly or implicitly, using SGD, not only helps avoid overfitting, but also fosters invariance and disentangling of the learned representation. The theory also enables predicting sharp phase transitions between underfitting and overfitting random labels at precise information values, and sheds light on the relation between the geometry of the loss function, in particular so-called "flat minima," and generalization.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Deep Learning to Hash with Multiple Representations
    Kang, Yoonseop
    Kim, Saehoon
    Choi, Seungjin
    12TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM 2012), 2012, : 930 - 935
  • [22] Measurement Invariance Investigation for Performance of Deep Learning Architectures
    Chen, Dewang
    Lu, Yuqi
    Hsu, Chih-Yu
    IEEE ACCESS, 2022, 10 : 78070 - 78087
  • [23] Invariance of object detection in untrained deep neural networks
    Cheon, Jeonghwan
    Baek, Seungdae
    Paik, Se-Bum
    FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2022, 16
  • [24] Omics Data and Data Representations for Deep Learning-Based Predictive Modeling
    Tsimenidis, Stefanos
    Vrochidou, Eleni
    Papakostas, George A.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2022, 23 (20)
  • [25] Deep Causal Disentanglement Network With Domain Generalization for Cross-Machine Bearing Fault Diagnosis
    Guo, Chaochao
    Sun, Youchao
    Yu, Rourou
    Ren, Xinxin
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2025, 74
  • [26] Special Features of Deep Learning and Symbol Emergence
    Yutaka Matsuo
    New Generation Computing, 2020, 38 : 5 - 6
  • [27] Emergence of Network Motifs in Deep Neural Networks
    Zambra, Matteo
    Maritan, Amos
    Testolin, Alberto
    ENTROPY, 2020, 22 (02)
  • [28] Special Features of Deep Learning and Symbol Emergence
    Matsuo, Yutaka
    NEW GENERATION COMPUTING, 2020, 38 (01) : 5 - 6
  • [29] DeepNoise: Signal and Noise Disentanglement Based on Classifying Fluorescent Microscopy Images via Deep Learning
    Yang, Sen
    Shen, Tao
    Fang, Yuqi
    Wang, Xiyue
    Zhang, Jun
    Yang, Wei
    Huang, Junzhou
    Han, Xiao
    GENOMICS PROTEOMICS & BIOINFORMATICS, 2022, 20 (05) : 989 - 1001
  • [30] Deep Image Representations for Coral Image Classification
    Mahmood, Ammar
    Bennamoun, Mohammed
    An, Senjian
    Sohel, Ferdous A.
    Boussaid, Farid
    Hovey, Renae
    Kendrick, Gary A.
    Fisher, Robert B.
    IEEE JOURNAL OF OCEANIC ENGINEERING, 2019, 44 (01) : 121 - 131