Memorisation versus Generalisation in Pre-trained Language Models

被引:0
作者
Tanzer, Michael [1 ]
Ruder, Sebastian [2 ]
Rei, Marek [1 ]
机构
[1] Imperial Coll London, London, England
[2] Google Res, Mountain View, CA USA
来源
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.
引用
收藏
页码:7564 / 7578
页数:15
相关论文
共 36 条
  • [1] [Anonymous], 2020, ADV NEURAL INFORM PR
  • [2] Arpit Devansh, 2017, ARXIV170605394
  • [3] Augenstein Isabelle, 2017, ARXIV170102877CS
  • [4] Baevski A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5360
  • [5] Carlini Nicholas, 2019, ARXIV180208232CS
  • [6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [7] Derczynski Leon, 2017, P 3 WORKSHOP NOISY U, P140, DOI 10.18653/v1/W17-4418,eprint:https://aclanthology.org/W17-4418.pdf
  • [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] He Pengcheng, 2020, ARXIV2006