Using the Metropolis algorithm to explore the loss surface of a recurrent neural network

被引:0
作者
Casert, Corneel [1 ]
Whitelam, Stephen [1 ]
机构
[1] Molecular Foundry, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, 94720, CA
关键词
45;
D O I
10.1063/5.0221223
中图分类号
学科分类号
摘要
In the limit of small trial moves the Metropolis Monte Carlo algorithm is equivalent to gradient descent on the energy function in the presence of Gaussian white noise. This observation was originally used to demonstrate a correspondence between Metropolis Monte Carlo moves of model molecules and overdamped Langevin dynamics, but it also applies in the context of training a neural network: making small random changes to the weights of a neural network, accepted with the Metropolis probability, with the loss function playing the role of energy, has the same effect as training by explicit gradient descent in the presence of Gaussian white noise. We explore this correspondence in the context of a simple recurrent neural network. We also explore regimes in which this correspondence breaks down, where the gradient of the loss function becomes very large or small. In these regimes the Metropolis algorithm can still effect training, and so can be used as a probe of the loss function of a neural network in regimes in which gradient descent struggles. We also show that training can be accelerated by making purposely-designed Monte Carlo trial moves of neural-network weights. © 2024 American Institute of Physics. All rights reserved.
引用
收藏
相关论文
共 42 条
  • [1] Metropolis N., Rosenbluth A.W., Rosenbluth M.N., Teller A.H., Teller E., Equation of state calculations by fast computing machines, J. Chem. Phys., 21, pp. 1087-1092, (1953)
  • [2] Gubernatis J.E., Marshall Rosenbluth and the metropolis algorithm, Phys. Plasmas, 12, (2005)
  • [3] Rosenbluth M.N., Genesis of the Monte Carlo algorithm for statistical mechanics, AIP Conf. Proc., 690, pp. 22-30, (2003)
  • [4] Whitacre M.H., Rosenbluth A.W., Tech. Rep., (2021)
  • [5] Bhanot G., The metropolis algorithm, Rep. Prog. Phys., 51, (1988)
  • [6] Beichl I., Sullivan F., The metropolis algorithm, Comput. Sci. Eng., 2, pp. 65-69, (2000)
  • [7] Frenkel D., Smit B., Understanding Molecular Simulation: From Algorithms to Applications, 1, (2001)
  • [8] Kikuchi K., Yoshida M., Maekawa T., Watanabe H., Metropolis Monte Carlo method as a numerical technique to solve the Fokker-Planck equation, Chem. Phys. Lett., 185, pp. 335-338, (1991)
  • [9] Kikuchi K., Yoshida M., Maekawa T., Watanabe H., Metropolis Monte Carlo method for Brownian dynamics simulation generalized to include hydrodynamic interactions, Chem. Phys. Lett., 196, pp. 57-61, (1992)
  • [10] Whitelam S., Selin V., Park S.-W., Tamblyn I., Correspondence between neuroevolution and gradient descent, Nat. Commun., 12, (2021)