[3] Carnegie Mellon Univ, Machine Learning Dept, Pittsburgh, PA USA
来源:
2021 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL-HLT 2021)
|
2021年
基金:
美国国家科学基金会;
关键词:
STATISTICAL-MODELS;
D O I:
暂无
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Standard autoregressive language models perform only polynomial-time computation to compute the probability of the next symbol. While this is attractive, it means they cannot model distributions whose next-symbol probability is hard to compute. Indeed, they cannot even model them well enough to solve associated easy decision problems for which an engineer might want to consult a language model. These limitations apply no matter how much computation and data are used to train the model, unless the model is given access to oracle parameters that grow superpolynomially in sequence length. Thus, simply training larger autoregressive language models is not a panacea for NLP. Alternatives include energy-based models (which give up efficient sampling) and latent-variable autoregressive models (which give up efficient scoring of a given string). Both are powerful enough to escape the above limitations.
机构:
Department of Systems Innovation, Graduate School of Engineering Science, Osaka University, ToyonakaDepartment of Systems Innovation, Graduate School of Engineering Science, Osaka University, Toyonaka
机构:
Univ Witwatersrand, Sch Stat & Actuarial Sci, ZA-2050 Johannesburg, South AfricaUniv Witwatersrand, Sch Stat & Actuarial Sci, ZA-2050 Johannesburg, South Africa
Lau, John W.
So, Mike K. P.
论文数: 0引用数: 0
h-index: 0
机构:
Hong Kong Univ Sci & Technol, Dept Informat & Syst Management, Kowloon, Hong Kong, Peoples R ChinaUniv Witwatersrand, Sch Stat & Actuarial Sci, ZA-2050 Johannesburg, South Africa