Superpolynomial Lower Bounds for Learning One-Layer Neural Networks using Gradient Descent

被引:0
作者
Goel, Surbhi [1 ]
Gollakota, Aravind [1 ]
Jin, Zhihan [2 ]
Karmalkar, Sushrut [1 ]
Klivans, Adam [1 ]
机构
[1] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[2] Shanghai Jiao Tong Univ, Dept Comp Sci, Shanghai, Peoples R China
来源
25TH AMERICAS CONFERENCE ON INFORMATION SYSTEMS (AMCIS 2019) | 2019年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We give the first superpolynomial lower bounds for learning one-layer neural networks with respect to the Gaussian distribution for a broad class of algorithms. We prove that gradient descent run on any classifier with respect to square loss will fail to achieve small test error in polynomial time. Prior work held only for gradient descent run with small batch sizes and sufficiently smooth classifiers. For classification, we give a stronger result, namely that any statistical query (SQ) algorithm (including gradient descent) will fail to achieve small test error in polynomial time. Our lower bounds hold for commonly used activations such as ReLU and sigmoid. The core of our result relies on a novel construction of a simple family of neural networks that are exactly orthogonal with respect to all spherically symmetric distributions.
引用
收藏
页数:10
相关论文
共 30 条
[1]  
Abbe E., 2020, ARXIV200102992
[2]  
Andoni A., 2019, 30 INC C ALG LEARN T
[3]  
Andoni A., 2014, P 25 ANN ACM SIAM S, P500
[4]  
[Anonymous], 2015, ARXIV150608473
[5]  
Balázs S, 2009, LECT NOTES ARTIF INT, V5809, P186, DOI 10.1007/978-3-642-04414-4_18
[6]  
Blum A., 1994, Proceedings of the Twenty-Sixth Annual ACM Symposium on the Theory of Computing, P253, DOI 10.1145/195058.195147
[7]  
Blum Avrim, 1989, Advances in neural information processing systems, P494
[8]  
Brutzkus A., 2017, CoRR
[9]  
Daniely A., 2020, ARXIV200603177
[10]  
Diakonikolas I., 2020, Proceedings of Thirty Third Conference on Learning Theory, P1514