A Primer in BERTology: What We Know About How BERT Works

被引:720
作者
Rogers, Anna [1 ]
Kovaleva, Olga [2 ]
Rumshisky, Anna [2 ]
机构
[1] Univ Copenhagen, Ctr Social Data Sci, Copenhagen, Denmark
[2] Univ Massachusetts, Dept Comp Sci, Lowell, MA USA
关键词
D O I
10.1162/tacl_a_00349
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformer-based models have pushed state of the art in many areas of NLP, but our understanding of what is behind their success is still limited. This paper is the first survey of over 150 studies of the popular BERT model. We review the current state of knowledge about how BERT works, what kind of information it learns and how it is represented, common modifications to its training objectives and architecture, the overparameterization issue, and approaches to compression. We then outline directions for future research.
引用
收藏
页码:842 / 866
页数:25
相关论文
共 180 条
[1]  
Aenmacher Matthias., 2020, ARXIV200100781CSSTAT
[2]  
Akbik A, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P724
[3]  
[Anonymous], 2019, CoRR
[4]  
[Anonymous], 2016, arXiv:1609.08144
[5]  
Arase Y, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5393
[6]  
Arkhangelskaia Ekaterina, 2019, ARXIV PREPRINT ARXIV
[7]  
Artetxe Mikel., Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, P4623, DOI DOI 10.18653/V1/2020.ACL-MAIN.421
[8]  
Baevski A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5360
[9]  
Balasubramanian S, 2020, 5TH WORKSHOP ON REPRESENTATION LEARNING FOR NLP (REPL4NLP-2020), P205
[10]  
Bao Hangbo, 2020, ARXIV200212804CS