Limitations of Autoregressive Models and Their Alternatives

被引：0

作者：

Lin, Chu-Cheng ^{[1
]}

Jaech, Aaron ^{[2
]}

Li, Xin ^{[1
]}

Gormley, Matthew R. ^{[3
]}

Eisner, Jason ^{[1
]}

机构：

[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA

[2] Facebook AI, New York, NY USA

[3] Carnegie Mellon Univ, Machine Learning Dept, Pittsburgh, PA USA

来源：

2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021) | 2021年

基金：

美国国家科学基金会;

关键词：

STATISTICAL-MODELS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.

引用

页码：5147 / 5172

页数：26

共 50 条

[31] An extension of max autoregressive models
Naveau, Philippe
Zhang, Zhengjun
Zhu, Bin
STATISTICS AND ITS INTERFACE, 2011, 4 (02) : 253 - 266
[32] On inference for threshold autoregressive models
Stramer, O
Lin, YJ
TEST, 2002, 11 (01) : 55 - 71
[33] Bias reduction in autoregressive models
Patterson, KD
ECONOMICS LETTERS, 2000, 68 (02) : 135 - 141
[34] Superharmonic priors for autoregressive models
Tanaka F.
Information Geometry, 2018, 1 (2) : 215 - 235
[35] Multiscale autoregressive models and wavelets
Daoudi, K
Frakt, AB
Willsky, AS
IEEE TRANSACTIONS ON INFORMATION THEORY, 1999, 45 (03) : 828 - 845
[36] Bayesian methods for autoregressive models
Penny, W.D.
Roberts, S.J.
Neural Networks for Signal Processing - Proceedings of the IEEE Workshop, 2000, 1 : 125 - 134
[37] FITTING AUTOREGRESSIVE MODELS FOR PREDICTION
AKAIKE, H
ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 1969, 21 (02) : 243 - &
[38] Bayesian mixture of autoregressive models
Lau, John W.
So, Mike K. P.
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 53 (01) : 38 - 60
[39] Extrapolation in fractional autoregressive models
Andel, J
Neuhaus, G
KYBERNETIKA, 1998, 34 (03) : 309 - 316
[40] MULTIVARIATE HYSTERETIC AUTOREGRESSIVE MODELS
Tsay, Ruey
STATISTICA SINICA, 2021, 31 : 2257 - 2274

← 1 2 3 4 5 →