Memorisation versus Generalisation in Pre-trained Language Models

被引：0

作者：

Tanzer, Michael ^{[1
]}

Ruder, Sebastian ^{[2
]}

Rei, Marek ^{[1
]}

机构：

[1] Imperial Coll London, London, England

[2] Google Res, Mountain View, CA USA

来源：

PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS) | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

State-of-the-art pre-trained language models have been shown to memorise facts and perform well with limited amounts of training data. To gain a better understanding of how these models learn, we study their generalisation and memorisation capabilities in noisy and low-resource scenarios. We find that the training of these models is almost unaffected by label noise and that it is possible to reach near-optimal results even on extremely noisy datasets. However, our experiments also show that they mainly learn from high-frequency patterns and largely fail when tested on low-resource tasks such as few-shot learning and rare entity recognition. To mitigate such limitations, we propose an extension based on prototypical networks that improves performance in low-resource named entity recognition tasks.

引用

页码：7564 / 7578

页数：15

共 36 条

[1] [Anonymous], 2020, ADV NEURAL INFORM PR
[2] Arpit Devansh, 2017, ARXIV170605394
[3] Augenstein Isabelle, 2017, ARXIV170102877CS
[4] Baevski A, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P5360
[5] Carlini Nicholas, 2019, ARXIV180208232CS
[6] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[7] Derczynski Leon, 2017, P 3 WORKSHOP NOISY U, P140, DOI 10.18653/v1/W17-4418,eprint:https://aclanthology.org/W17-4418.pdf
[8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] He Pengcheng, 2020, ARXIV2006

← 1 2 3 4 →