Preventing Catastrophic Forgetting in Continual Learning of New Natural Language Tasks

被引：4

作者：

Kar, Sudipta ^{[1
]}

Castellucci, Giuseppe ^{[1
]}

Filice, Simone ^{[2
]}

Malmasi, Shervin ^{[1
]}

Rokhlenko, Oleg ^{[1
]}

机构：

[1] Amazon, Seattle, WA 98109 USA

[2] Amazon, Tel Aviv, Israel

来源：

PROCEEDINGS OF THE 28TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2022 | 2022年

关键词：

continual learning; catastrophic forgetting; text classification;

D O I：

10.1145/3534678.3539169

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-Task Learning (MTL) is widely-accepted in Natural Language Processing as a standard technique for learning multiple related tasks in one model. Training an MTL model requires having the training data for all tasks available at the same time. As systems usually evolve over time, (e.g., to support new functionalities), adding a new task to an existing MTL model usually requires retraining the model from scratch on all the tasks and this can be time-consuming and computationally expensive. Moreover, in some scenarios, the data used to train the original training may be no longer available, for example, due to storage or privacy concerns. In this paper, we approach the problem of incrementally expanding MTL models' capability to solve new tasks over time by distilling the knowledge of an already trained model on n tasks into a new one for solving n + 1 tasks. To avoid catastrophic forgetting, we propose to exploit unlabeled data from the same distributions of the old tasks. Our experiments on publicly available benchmarks show that such a technique dramatically benefits the distillation by preserving the already acquired knowledge (i.e., preventing up to 20% performance drops on old tasks) while obtaining good performance on the incrementally added tasks. Further, we also show that our approach is beneficial in practical settings by using data from a leading voice assistant.

引用

页码：3137 / 3145

页数：9

共 35 条

[1] Rainbow Memory: Continual Learning with a Memory of Diverse Samples [J].

Bang, Jihwan ;

Kim, Heesu ;

Yoo, YoungJoon ;

Ha, Jung-Woo ;

Choi, Jonghyun .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :8214-8223

[2]

Caruana R.A., 1993, P 10 INT C INT C MAC, P41, DOI DOI 10.1016/B978-1-55860-307-3.50012-5

[3]

Castellucci Giuseppe, 2021, P 59 ANN M ASS COMP, V2, P837, DOI [10.18653/v1/2021.acl-short, DOI 10.18653/V1/2021.ACL-SHORT.106]

[4]

Chen Z., 2018, LIFELONG MACHINE LEA

[5]

Clark K, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P5931

[6]

De Lange Matthias, 2019, ARXIV190908383CSCV

[7]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[8]

Dolan W. B., 2005, P 3 INT WORKSH PAR I

[9]

Goodfellow Ian J., 2013, ARXIV13126211STATML

[10]

He Jiangpeng, 2021, UNSUPERVISED CONTINU

← 1 2 3 4 →