An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引：0

作者：

Mehta, Sanket Vaibhav ^{[1
]}

Patil, Darshan ^{[2
]}

Chandar, Sarath ^{[3
]}

Strubell, Emma ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA

[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada

[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2023年 / 24卷

基金：

加拿大自然科学与工程研究理事会;

关键词：

Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.

引用

页数：50

共 50 条

[31] Pre-training strategy for solving evolution equations based on physics-informed neural networks
Guo, Jiawei
Yao, Yanzhong
Wang, Han
Gu, Tongxiang
JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 489
[32] Lifelong learning through digital storytelling in corporate training
Hack, Josias Ricardo
Ramos, Fernando
Pinto Santos, Arnaldo Manuel
Moreira, Lucia de Freitas
REVISTA COMPLUTENSE DE EDUCACION, 2015, 26 (02): : 351 - 365
[33] TALK -: A training program to encourage lifelong learning in school
Schober, Barbara
Finsterwald, Monika
Wagner, Petra
Lueftenegger, Marko
Aysner, Michael
Spiel, Christiane
ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY, 2007, 215 (03): : 183 - 193
[34] Object Adaptive Self-Supervised Dense Visual Pre-Training
Zhang, Yu
Zhang, Tao
Zhu, Hongyuan
Chen, Zihan
Mi, Siya
Peng, Xi
Geng, Xin
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2228 - 2240
[35] Lifelong Learning As A Part Of Training In The Field Of Civil Engineering
Kubeckova, Darja
4TH WORLD CONFERENCE ON LEARNING TEACHING AND EDUCATIONAL LEADERSHIP (WCLTA-2013), 2014, 141 : 623 - 627
[36] Life satisfaction and lifelong learning: the role of intrinsic motivation and adaptation of the MSSP lifelong learning scale
Matos, Joao Filipe Lacerda
Silva, Vitor Hugo
Silvestre, Paulo
Pinto, Antonio Manuel Gouveia
INTERNATIONAL JOURNAL OF LIFELONG EDUCATION, 2024,
[37] UniTE: A Survey and Unified Pipeline for Pre-Training Spatiotemporal Trajectory Embeddings
Lin, Yan
Zhou, Zeyu
Liu, Yicheng
Lv, Haochen
Wen, Haomin
Li, Tianyi
Li, Yushuai
Jensen, Christian S.
Guo, Shengnan
Lin, Youfang
Wan, Huaiyu
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (03) : 1475 - 1494
[38] Multi-Level Pre-Training for Encrypted Network Traffic Classification
Park, Jee-Tae
Choi, Yang-Seo
Cho, Bu-Seung
Kim, Seung-Hae
Kim, Myung-Sup
IEEE ACCESS, 2025, 13 : 68643 - 68659
[39] Lightweight Model Pre-Training via Language Guided Knowledge Distillation
Li, Mingsheng
Zhang, Lin
Zhu, Mingzhen
Huang, Zilong
Yu, Gang
Fan, Jiayuan
Chen, Tao
IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10720 - 10730
[40] Pre-Training of Deep Bidirectional Protein Sequence Representations With Structural Information
Min, Seonwoo
Park, Seunghyun
Kim, Siwon
Choi, Hyun-Soo
Lee, Byunghan
Yoon, Sungroh
IEEE ACCESS, 2021, 9 : 123912 - 123926

← 1 2 3 4 5 →