Limitations of Autoregressive Models and Their Alternatives

被引:0
|
作者
Lin, Chu-Cheng [1 ]
Jaech, Aaron [2 ]
Li, Xin [1 ]
Gormley, Matthew R. [3 ]
Eisner, Jason [1 ]
机构
[1] Johns Hopkins Univ, Dept Comp Sci, Baltimore, MD 21218 USA
[2] Facebook AI, New York, NY USA
[3] Carnegie Mellon Univ, Machine Learning Dept, Pittsburgh, PA USA
来源
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021) | 2021年
基金
美国国家科学基金会;
关键词
STATISTICAL-MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.
引用
收藏
页码:5147 / 5172
页数:26
相关论文
共 50 条
  • [31] An extension of max autoregressive models
    Naveau, Philippe
    Zhang, Zhengjun
    Zhu, Bin
    STATISTICS AND ITS INTERFACE, 2011, 4 (02) : 253 - 266
  • [32] On inference for threshold autoregressive models
    Stramer, O
    Lin, YJ
    TEST, 2002, 11 (01) : 55 - 71
  • [33] Bias reduction in autoregressive models
    Patterson, KD
    ECONOMICS LETTERS, 2000, 68 (02) : 135 - 141
  • [34] Superharmonic priors for autoregressive models
    Tanaka F.
    Information Geometry, 2018, 1 (2) : 215 - 235
  • [35] Multiscale autoregressive models and wavelets
    Daoudi, K
    Frakt, AB
    Willsky, AS
    IEEE TRANSACTIONS ON INFORMATION THEORY, 1999, 45 (03) : 828 - 845
  • [36] Bayesian methods for autoregressive models
    Penny, W.D.
    Roberts, S.J.
    Neural Networks for Signal Processing - Proceedings of the IEEE Workshop, 2000, 1 : 125 - 134
  • [38] Bayesian mixture of autoregressive models
    Lau, John W.
    So, Mike K. P.
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 53 (01) : 38 - 60
  • [39] Extrapolation in fractional autoregressive models
    Andel, J
    Neuhaus, G
    KYBERNETIKA, 1998, 34 (03) : 309 - 316
  • [40] MULTIVARIATE HYSTERETIC AUTOREGRESSIVE MODELS
    Tsay, Ruey
    STATISTICA SINICA, 2021, 31 : 2257 - 2274