A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions

被引:0
作者
Adlam, Ben [1 ]
Levinson, Jake [1 ]
Pennington, Jeffrey [1 ]
机构
[1] Google Res, Mountain View, CA 94043 USA
来源
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151 | 2022年 / 151卷
关键词
SAMPLE COVARIANCE MATRICES; SPECTRAL DISTRIBUTION; EIGENVALUES;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
One of the distinguishing characteristics of modern deep learning systems is their use of neural network architectures with enormous numbers of parameters, often in the millions and sometimes even in the billions. While this paradigm has inspired significant research on the properties of large networks, relatively little work has been devoted to the fact that these networks are often used to model large complex datasets, which may themselves contain millions or even billions of constraints. In this work, we focus on this high-dimensional regime in which both the dataset size and the number of features tend to infinity. We analyze the performance of random feature regression with features F = f (WX + B) for a random weight matrix W and bias vector B, obtaining exact formulae for the asymptotic training and test errors for data generated by a linear teacher model. The role of the bias can be understood as parameterizing a distribution over activation functions, and our analysis directly generalizes to such distributions, even those not expressible with a traditional additive bias. Intriguingly, we find that a mixture of nonlinearities can improve both the training and test errors over the best single nonlinearity, suggesting that mixtures of nonlinearities might be useful for approximate kernel methods or neural network architecture design.
引用
收藏
页数:24
相关论文
共 59 条
  • [1] Adlam B., 2020, Advances in Neural Information Processing Systems, P11022
  • [2] Adlam Ben, 2020, INT C MACHINE LEARNI, V119
  • [3] Statistical Mechanics of Optimal Convex Inference in High Dimensions
    Advani, Madhu
    Ganguli, Surya
    [J]. PHYSICAL REVIEW X, 2016, 6 (03):
  • [4] High-dimensional dynamics of generalization error in neural networks
    Advani, Madhu S.
    Saxe, Andrew M.
    Sompolinsky, Haim
    [J]. NEURAL NETWORKS, 2020, 132 : 428 - 446
  • [5] Agostinelli F., 2014, arXiv preprint arXiv:1412.6830
  • [6] [Anonymous], 2017, COURANT LECT NOTES M, DOI DOI 10.1214/17-EJP42
  • [7] BA J., 2019, INT C LEARN REPR
  • [8] Bai ZD, 2008, STAT SINICA, V18, P425
  • [9] Limiting Spectral Distribution of Large Sample Covariance Matrices Associated with a Class of Stationary Processes
    Banna, Marwa
    Merlevede, Florence
    [J]. JOURNAL OF THEORETICAL PROBABILITY, 2015, 28 (02) : 745 - 783
  • [10] On the limiting spectral distribution for a large class of symmetric random matrices with correlated entries
    Banna, Marwa
    Merlevede, Florence
    Peligrad, Magda
    [J]. STOCHASTIC PROCESSES AND THEIR APPLICATIONS, 2015, 125 (07) : 2700 - 2726