An Empirical Investigation of the Role of Pre-training in Lifelong Learning

被引:0
|
作者
Mehta, Sanket Vaibhav [1 ]
Patil, Darshan [2 ]
Chandar, Sarath [3 ]
Strubell, Emma [1 ]
机构
[1] Carnegie Mellon Univ, Sch Comp Sci, Pittsburgh, PA 15213 USA
[2] Univ Montreal, Mila Quebec AI Inst, Montreal, PQ H3T 1J4, Canada
[3] Ecole Polytech Montreal, Mila Quebec AI Inst, Canada CIFAR AI Chair, Montreal, PQ H3T 1J4, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
Lifelong Learning; Continual Learning; Pre; -training; Flat Minima; Sharpness;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The lifelong learning paradigm in machine learning is an attractive alternative to the more prominent isolated learning scheme not only due to its resemblance to biological learning but also its potential to reduce energy waste by obviating excessive model re-training. A key challenge to this paradigm is the phenomenon of catastrophic forgetting. With the increasing popularity and success of pre-trained models in machine learning, we pose the question: What role does pre-training play in lifelong learning, specifically with respect to catastrophic forgetting? We investigate existing methods in the context of large, pre-trained models and evaluate their performance on a variety of text and image classification tasks, including a large-scale study using a novel data set of 15 diverse NLP tasks. Across all settings, we observe that generic pre-training implicitly alleviates the effects of catastrophic forgetting when learning multiple tasks sequentially compared to randomly initialized models. We then further investigate why pre-training alleviates forgetting in this setting. We study this phenomenon by analyzing the loss landscape, finding that pre-trained weights appear to ease forgetting by leading to wider minima. Based on this insight, we propose jointly optimizing for current task loss and loss basin sharpness to explicitly encourage wider basins during sequential fine-tuning. We show that this optimization approach outperforms several state-of-the-art task-sequential continual learning algorithms across multiple settings, occasionally even without retaining a memory that scales in size with the number of tasks.
引用
收藏
页数:50
相关论文
共 50 条
  • [31] Pre-training strategy for solving evolution equations based on physics-informed neural networks
    Guo, Jiawei
    Yao, Yanzhong
    Wang, Han
    Gu, Tongxiang
    JOURNAL OF COMPUTATIONAL PHYSICS, 2023, 489
  • [32] Lifelong learning through digital storytelling in corporate training
    Hack, Josias Ricardo
    Ramos, Fernando
    Pinto Santos, Arnaldo Manuel
    Moreira, Lucia de Freitas
    REVISTA COMPLUTENSE DE EDUCACION, 2015, 26 (02): : 351 - 365
  • [33] TALK -: A training program to encourage lifelong learning in school
    Schober, Barbara
    Finsterwald, Monika
    Wagner, Petra
    Lueftenegger, Marko
    Aysner, Michael
    Spiel, Christiane
    ZEITSCHRIFT FUR PSYCHOLOGIE-JOURNAL OF PSYCHOLOGY, 2007, 215 (03): : 183 - 193
  • [34] Object Adaptive Self-Supervised Dense Visual Pre-Training
    Zhang, Yu
    Zhang, Tao
    Zhu, Hongyuan
    Chen, Zihan
    Mi, Siya
    Peng, Xi
    Geng, Xin
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2025, 34 : 2228 - 2240
  • [35] Lifelong Learning As A Part Of Training In The Field Of Civil Engineering
    Kubeckova, Darja
    4TH WORLD CONFERENCE ON LEARNING TEACHING AND EDUCATIONAL LEADERSHIP (WCLTA-2013), 2014, 141 : 623 - 627
  • [36] Life satisfaction and lifelong learning: the role of intrinsic motivation and adaptation of the MSSP lifelong learning scale
    Matos, Joao Filipe Lacerda
    Silva, Vitor Hugo
    Silvestre, Paulo
    Pinto, Antonio Manuel Gouveia
    INTERNATIONAL JOURNAL OF LIFELONG EDUCATION, 2024,
  • [37] UniTE: A Survey and Unified Pipeline for Pre-Training Spatiotemporal Trajectory Embeddings
    Lin, Yan
    Zhou, Zeyu
    Liu, Yicheng
    Lv, Haochen
    Wen, Haomin
    Li, Tianyi
    Li, Yushuai
    Jensen, Christian S.
    Guo, Shengnan
    Lin, Youfang
    Wan, Huaiyu
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (03) : 1475 - 1494
  • [38] Multi-Level Pre-Training for Encrypted Network Traffic Classification
    Park, Jee-Tae
    Choi, Yang-Seo
    Cho, Bu-Seung
    Kim, Seung-Hae
    Kim, Myung-Sup
    IEEE ACCESS, 2025, 13 : 68643 - 68659
  • [39] Lightweight Model Pre-Training via Language Guided Knowledge Distillation
    Li, Mingsheng
    Zhang, Lin
    Zhu, Mingzhen
    Huang, Zilong
    Yu, Gang
    Fan, Jiayuan
    Chen, Tao
    IEEE TRANSACTIONS ON MULTIMEDIA, 2024, 26 : 10720 - 10730
  • [40] Pre-Training of Deep Bidirectional Protein Sequence Representations With Structural Information
    Min, Seonwoo
    Park, Seunghyun
    Kim, Siwon
    Choi, Hyun-Soo
    Lee, Byunghan
    Yoon, Sungroh
    IEEE ACCESS, 2021, 9 : 123912 - 123926