Of Non-Linearity and Commutativity in BERT

被引:11
|
作者
Zhao, Sumu [1 ]
Pascual, Damian [1 ]
Brunner, Gino [1 ]
Wattenhofer, Roger [1 ]
机构
[1] Swiss Fed Inst Technol, Zurich, Switzerland
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
关键词
D O I
10.1109/IJCNN52387.2021.9533563
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work we provide new insights into the transformer architecture, and in particular, its best-known variant, BERT. First, we propose a method to measure the degree of non-linearity of different elements of transformers. Next, we focus our investigation on the feed-forward networks (FFN) inside transformers, which contain two thirds of the model parameters and have so far not received much attention. We find that FFNs are an inefficient yet important architectural element and that they cannot simply be replaced by attention blocks without a degradation in performance. Moreover, we study the interactions between layers in BERT and show that, while the layers exhibit some hierarchical structure, they extract features in a fuzzy manner. Our results suggest that BERT has an inductive bias towards layer commutativity, which we find is mainly due to the skip connections. This provides a justification for the strong performance of recurrent and weight-shared transformer models.
引用
收藏
页数:8
相关论文
共 50 条
  • [1] Non-linearity
    Duddeck, Fabian M. E.
    von Mises, R.
    Lecture Notes in Applied and Computational Mechanics, 2002, 5 : 115 - 123
  • [2] Linearity or non-linearity error
    Buckland, EC
    MEASUREMENT & CONTROL, 2005, 38 (06): : 187 - 187
  • [3] On The Linearity and Non-Linearity of Analysis
    Sparby, Terje
    CONSTRUCTIVIST FOUNDATIONS, 2019, 14 (02): : 152 - 153
  • [4] Linearity of non-linearity error
    Buckland, EC
    MEASUREMENT & CONTROL, 2005, 38 (05): : 155 - 155
  • [5] Linearity in calibration: The importance of non-linearity
    Mark, H
    Workman, J
    SPECTROSCOPY, 2005, 20 (01) : 56 - 59
  • [6] Non-linearity as the Metric Completion of Linearity
    Mazza, Damiano
    TYPED LAMBDA CALCULI AND APPLICATIONS, TLCA 2013, 2013, 7941 : 3 - 14
  • [7] Linearity and non-linearity in cerebral hemodynamics
    Giller, CA
    Mueller, M
    MEDICAL ENGINEERING & PHYSICS, 2003, 25 (08) : 633 - 646
  • [8] Tissue non-linearity
    Duck, F.
    Proceedings of the Institution of Mechanical Engineers, Part H: Journal of Engineering in Medicine, 2010, 224 (02) : 155 - 170
  • [9] PROBING NON-LINEARITY
    不详
    NATURE-PHYSICAL SCIENCE, 1972, 236 (67): : 81 - &
  • [10] NON-LINEARITY IN RHEOLOGY
    REINER, M
    ISRAEL JOURNAL OF TECHNOLOGY, 1964, 2 (03): : 264 - &