Unification of symmetries inside neural networks: transformer, feedforward and neural ODE

被引:4
作者
Hashimoto, Koji [1 ]
Hirono, Yuji [1 ]
Sannai, Akiyoshi [1 ]
机构
[1] Kyoto Univ, Dept Phys, Kyoto, Japan
来源
MACHINE LEARNING-SCIENCE AND TECHNOLOGY | 2024年 / 5卷 / 02期
关键词
neural networks; symmetry; gravity; DYNAMICAL MODEL; ANALOGY;
D O I
10.1088/2632-2153/ad5927
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding the inner workings of neural networks, including transformers, remains one of the most challenging puzzles in machine learning. This study introduces a novel approach by applying the principles of gauge symmetries, a key concept in physics, to neural network architectures. By regarding model functions as physical observables, we find that parametric redundancies of various machine learning models can be interpreted as gauge symmetries. We mathematically formulate the parametric redundancies in neural ODEs, and find that their gauge symmetries are given by spacetime diffeomorphisms, which play a fundamental role in Einstein's theory of gravity. Viewing neural ODEs as a continuum version of feedforward neural networks, we show that the parametric redundancies in feedforward neural networks are indeed lifted to diffeomorphisms in neural ODEs. We further extend our analysis to transformer models, finding natural correspondences with neural ODEs and their gauge symmetries. The concept of gauge symmetries sheds light on the complex behavior of deep learning models through physics and provides us with a unifying perspective for analyzing various machine learning architectures.
引用
收藏
页数:15
相关论文
共 55 条
[1]   Deep learning and AdS/QCD [J].
Akutagawa, Tetsuya ;
Hashimoto, Koji ;
Sumimoto, Takayuki .
PHYSICAL REVIEW D, 2020, 102 (02)
[2]  
Amari S, 2018, NEURAL COMPUT, V30, P1, DOI [10.1162/NECO_a_01029, 10.1162/neco_a_01029]
[3]  
[Anonymous], 1918, Mathematisch-Physikalische Klasse, DOI DOI 10.1080/00411457108231446
[4]   DYNAMICAL STRUCTURE AND DEFINITION OF ENERGY IN GENERAL RELATIVITY [J].
ARNOWITT, R ;
DESER, S ;
MISNER, CW .
PHYSICAL REVIEW, 1959, 116 (05) :1322-1330
[5]   Neural ODEs as the deep limit of ResNets with constant weights [J].
Avelin, Benny ;
Nystrom, Kaj .
ANALYSIS AND APPLICATIONS, 2021, 19 (03) :397-437
[6]  
Badrinarayanan V, 2015, Arxiv, DOI arXiv:1511.01029
[7]   Visualizing high-dimensional loss landscapes with Hessian directions [J].
Boettcher, Lucas ;
Wheeler, Gregory .
JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2024, 2024 (02)
[8]   Near-optimal control of dynamical systems with neural ordinary differential equations [J].
Boettcher, Lucas ;
Asikis, Thomas .
MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2022, 3 (04)
[9]  
Brea J., 2019, arXiv
[10]  
Brown TB, 2020, ADV NEUR IN, V33