From block-Toeplitz matrices to differential equations on graphs: towards a general theory for scalable masked Transformers

被引：0

作者：

Choromanski, Krzysztof ^{[1
,2
]}

Lin, Han ^{[2
]}

Chen, Haoxian ^{[2
]}

Zhang, Tianyi ^{[2
]}

Sehanobish, Arijit

Likhosherstov, Valerii ^{[3
]}

Parker-Holder, Jack ^{[4
]}

Sarlos, Tamas ^{[5
]}

Weller, Adrian ^{[3
,6
]}

Weingarten, Thomas ^{[7
]}

机构：

[1] Google Brain Robot, Mountain View, CA 94043 USA

[2] Columbia Univ, New York, NY 10027 USA

[3] Univ Cambridge, Cambridge, England

[4] Univ Oxford, Oxford, England

[5] Google Res, Mountain View, CA USA

[6] Alan Turing Inst, Mountain View, CA USA

[7] Google, Mountain View, CA USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we provide, to the best of our knowledge, the first comprehensive approach for incorporating various masking mechanisms into Transformers architectures in a scalable way. We show that recent results on linear causal attention (Choromanski et al., 2021) and log-linear RPEattention (Luo et al., 2021) are special cases of this general mechanism. However by casting the problem as a topological (graph-based) modulation of unmasked attention, we obtain several results unknown before, including efficient d-dimensional RPE-masking and graph-kernel masking. We leverage many mathematical techniques ranging from spectral analysis through dynamic programming and random walks to new algorithms for solving Markov processes on graphs. We provide a corresponding empirical evaluation.

引用

页数：22

共 7 条

[1] On solution of large systems of linear equations with block-Toeplitz banded matrices
A. N. Malyshev
Doklady Mathematics, 2013, 87 : 153 - 155
[2] On solution of large systems of linear equations with block-Toeplitz banded matrices
Malyshev, A. N.
DOKLADY MATHEMATICS, 2013, 87 (02) : 153 - 155
[3] On a general clayse from the theory of linear differential equations.
Schlesinger, L
JOURNAL FUR DIE REINE UND ANGEWANDTE MATHEMATIK, 1902, 124 (1/4): : 47 - 58
[4] Integration of some differential-difference nonlinear equations using the spectral theory of normal block Jacobi matrices
Yu. M. Berezansky
A. A. Mokhon’ko
Functional Analysis and Its Applications, 2008, 42 : 1 - 18
[5] Integration of some differential-difference nonlinear equations using the spectral theory of normal block Jacobi matrices
Berezansky, Yu. M.
Mokhon'ko, A. A.
FUNCTIONAL ANALYSIS AND ITS APPLICATIONS, 2008, 42 (01) : 1 - 18
[6] A fast direct method for block triangular Toeplitz-like with tri-diagonal block systems from time-fractional partial differential equations
Ke, Rihuan
Ng, Michael K.
Sun, Hai-Wei
JOURNAL OF COMPUTATIONAL PHYSICS, 2015, 303 : 203 - 211
[7] DERIVATION OF GREEN-TYPE TRANSITIONAL + UNIFORM ASYMPTOTIC EXPANSIONS FROM DIFFERENTIAL EQUATIONS .1. GENERAL THEORY + APPLICATION TO MODIFIED BESSEL FUNCTIONS OF LARGE ORDER
JORNA, S
PROCEEDINGS OF THE ROYAL SOCIETY OF LONDON SERIES A-MATHEMATICAL AND PHYSICAL SCIENCES, 1964, 281 (1384) : 99 - +

← 1 →