Muppet: Massive Multi-task Representations with Pre-Finetuning

被引：0

作者：

Aghajanyan, Armen ^{[1
]}

Gupta, Anchit ^{[1
]}

Shrivastava, Akshat ^{[1
]}

Chen, Xilun ^{[1
]}

Zettlemoyer, Luke ^{[1
]}

Gupta, Sonal ^{[1
]}

机构：

[1] Facebook, Menlo Pk, CA 94025 USA

来源：

2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021) | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We propose pre-finetuning, an additional large-scale learning stage between language model pre-training and fine-tuning. Pre-finetuning is massively multi-task learning (around 50 datasets, over 4.8 million total labeled examples), and is designed to encourage learning of representations that generalize better to many different tasks. We show that prefinetuning consistently improves performance for pretrained discriminators (e.g. RoBERTa) and generation models (e.g. BART) on a wide range of tasks (sentence prediction, commonsense reasoning, MRC, etc.), while also significantly improving sample efficiency during fine-tuning. We also show that large-scale multi-tasking is crucial; pre-finetuning can hurt performance when few tasks are used up until a critical point (usually above 15) after which performance improves linearly in the number of tasks.

引用

页码：5799 / 5811

页数：13

共 73 条

[1] Aghajanyan Armen, 2020, CoRR
[2] Amini Aida, 2019, ARXIV190513319
[3] [Anonymous], 1993, Computational linguistics
[4] Bentivogli L., 2009, TAC
[5] Bowman S. R., 2015, P 2015 C EMPIRICAL M, DOI [10.18653/v1/d15-1075, DOI 10.18653/V1/D15-1075]
[6] Cer D., 2017, P 11 INT WORKSH SEM, DOI [10.18653/v1/S17-2001, DOI 10.18653/V1/S17-2001]
[7] Chen Z, 2018, PR MACH LEARN RES, V80
[8] Clark Christopher, 2019, P NAACL HLT 2019
[9] ELECTRA: PRE-TRAINING TEXT ENCODERS AS DISCRIMINATORS RATHER THAN GENERATORS
Clark, Kevin
Luong, Minh-Thang
Le, Quoc V.
Manning, Christopher D.
[J]. INFORMATION SYSTEMS RESEARCH, 2020,
[10] Clark Peter, 2018, CoRR

← 1 2 3 4 5 6 7 8 →