projUNN: efficient method for training deep networks with unitary matrices

被引：0

作者：

Kiani, Bobak T. ^{[1
]}

Balestriero, Randall ^{[2
]}

LeCun, Yann ^{[2
,3
]}

Lloyd, Seth ^{[1
,4
]}

机构：

[1] MIT, Cambridge, MA 02139 USA

[2] Meta AI, FAIR, Toronto, ON, Canada

[3] NYU, New York, NY 10003 USA

[4] Turing Inc, New York, NY USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

关键词：

MONTE-CARLO ALGORITHMS; RANK;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In learning with recurrent or very deep feed-forward networks, employing unitary matrices in each layer can be very effective at maintaining long-range stability. However, restricting network parameters to be unitary typically comes at the cost of expensive parameterizations or increased training runtime. We propose instead an efficient method based on rank-k updates - or their rank-k approximation - that maintains performance at a nearly optimal training runtime. We introduce two variants of this method, named Direct (projUNN-D) and Tangent (projUNN-T) projected Unitary Neural Networks, that can parameterize full N-dimensional unitary or orthogonal matrices with a training runtime scaling as O(kN(2)). Our method either projects low-rank gradients onto the closest unitary matrix (projUNN-T) or transports unitary matrices in the direction of the low-rank gradient (projUNN-D). Even in the fastest setting (k = 1), projUNN is able to train a model's unitary parameters to reach comparable performances against baseline implementations. In recurrent neural network settings, projUNN closely matches or exceeds bench-marked results from prior unitary neural networks. Finally, we preliminarily explore projUNN in training orthogonal convolutional neural networks, which are currently unable to outperform state of the art models but can potentially enhance stability and robustness at large depth.

引用

页数：16

共 85 条

[1] Read the fine print
Aaronson, Scott
[J]. NATURE PHYSICS, 2015, 11 (04) : 291 - 293
[2] Abadi M., 2016, TENSORFLOW LARGE SCA, P265, DOI 10.5555/3026877.3026899
[3] Quantum Algorithms for Feedforward Neural Networks
Allcock, Jonathan
Hsieh, Chang-Yu
Kerenidis, Iordanis
Zhang, Shengyu
[J]. ACM TRANSACTIONS ON QUANTUM COMPUTING, 2020, 1 (01):
[4] [Anonymous], 2015, ARXIV151106744
[5] [Anonymous], 2019, ADV NEUR IN
[6] [Anonymous], 2014, ARXIV14082873
[7] [Anonymous], 2015, ADV NEURAL INFORM PR
[8] [Anonymous], 1999, P 2 INT C AUD VID BA
[9] [Anonymous], 2015, CoRR, DOI DOI 10.1109/TPAMI.2013.124
[10] [Anonymous], 2013, Fast training of convolutional networks through FFTs

← 1 2 3 4 5 6 7 8 9 →